Coders Packet

Python File Format Conversion: Simplifying DOC, JPG, and TXT to PDF Conversion

By Sri Venkata Sai Kiran Chaitanya Agnihotram

Discover the power of Python for file format conversion. This blog explores Python libraries to effortlessly convert (.doc, .docx, .jpg, and .txt files to PDF format).

Introduction

Welcome to our blog on Python PDF conversion and file format transformation! In today's digital age, the ability to convert files from one format to another is incredibly valuable. Whether you need to convert images, documents, or text files, Python provides a plethora of powerful libraries and modules that simplify the process. In this blog, we will explore some of these libraries and demonstrate how you can leverage them to effortlessly convert files to PDF and other formats. Whether you're a developer, a data scientist, or simply someone in need of file format conversion, this guide will equip you with the knowledge to efficiently handle various file transformations using Python. So let's dive in and unlock the world of versatile file format conversion with Python!

 

Overview of File Format Conversion:

In this context, we will focus on converting various
Input formats: DOC, JPG, and TXT files
into
Output formats: PDF format. PDF, or Portable Document Format, offers compatibility and versatility across different platforms and devices.

Overview of Libraries Used:

A short note about the libraries and modules used in the program:

  1. img2pdf: This library is used for converting image files (in this case, JPG) to PDF. It provides a simple way to convert images to PDF format by utilizing the img2pdf.convert function.

    pip install img2pdf

    Use this command to install ' img2pdf ' library

  2. docx2pdf: This library is used for converting DOC and DOCX files to PDF. It provides a convenient way to convert Microsoft Word documents to PDF format by using the convert function.

    pip install docx2pdf

    Use this command to install ' docx2pdf ' library

  3. reportlab: This library is used for generating PDF documents programmatically. It provides a comprehensive set of tools for creating PDF files, including the ability to define page sizes, add text, images, and other elements to the PDF canvas.

    pip install reportlab

    Use this command to install ' reportlab ' library

  4. reportlab.lib.pagesizes: This module from the reportlab library provides predefined page sizes, such as letter, which is used in this program.

  5. reportlab.pdfgen.canvas: This module from the reportlab library is used to create a PDF canvas where you can add text, images, and other elements. It allows you to define the layout and content of the PDF document.

These libraries and modules are essential for performing the various file format conversions in the program. They provide convenient functions and methods to handle the conversion process and generate the desired output in PDF format.

 

Code for DOC, JPG, and TXT to PDF Conversion:

import os
import img2pdf
from docx2pdf import convert
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

def convert_to_pdf(input_file, output_file):
    if input_file.endswith('.jpg'):
        # Convert JPG to PDF
        with open(output_file, "wb") as pdf_file, open(input_file, "rb") as jpg_file:
            pdf_file.write(img2pdf.convert(jpg_file))

        print("JPG to PDF conversion completed.")

    elif input_file.endswith('.doc') or input_file.endswith('.docx'):
        # Convert DOC/DOCX to PDF
        convert(input_file, output_file)

        print("DOC/DOCX to PDF conversion completed.")

    elif input_file.endswith('.txt'):
        # Convert TXT to PDF
        c = canvas.Canvas(output_file, pagesize=letter)

        with open(input_file, 'r') as file:
            txt_content = file.read()

        c.setFont("Helvetica", 12)
        c.drawString(100, 700, txt_content)

        c.save()

        print("TXT to PDF conversion completed.")

    else:
        print("Unsupported file format.")

# Usage
input_file = r"C:\Users\user\Python Pdf converter\Sample text.txt"  # Provide the input file path here file can be a .txt or .jpg or .doc file
output_file = r"C:\Users\user\Python Pdf converter\Sample pdf.pdf"  # Provide the output file path here

convert_to_pdf(input_file, output_file)

 

Output:
(Consider Doc to pdf conversions)

output for doc to pdf conversion

 

Best Practices and Tips:

  • Use Relative File Paths: Instead of relying on absolute file paths, use relative file paths when locating the files. Relative paths are relative to the current working directory of the Python script, making it easier to locate files within the scope of the interpreter.

  • Organize Files in a Dedicated Folder: Create a dedicated folder within your project directory to store all the input and output files related to the file format conversion. This helps maintain a clean and organized structure, making it easier to locate the files.

  • Platform Used: I Used "Jupyter Notebook" for showing the execution of this file. If any other platform is used then the files to be converted are to be located within the same scope as the platform being used

 

Download Complete Code

Comments

No comments yet

Download Packet

Reviews Report

Submitted by Sri Venkata Sai Kiran Chaitanya Agnihotram (SaiKiran0267)

Download packets of source code on Coders Packet