Merge PDF Files in Python

Merging PDF files is one of the most common document tasks in automation, backend development, and file processing. Whether you are building a web app, a desktop tool, or a command-line utility, the ability to combine multiple PDFs into one file can save time and simplify workflows.

In Python, this task is straightforward once you know the right library and approach. You can merge invoices, reports, scanned pages, contracts, e-books, or any other PDF documents into a single file with only a few lines of code. But beyond the basic example, there are many practical details to understand: how to control page order, how to merge PDFs from a folder, how to handle errors, how to work with large files, and how to build reusable scripts.

This article explains everything in a practical, beginner-friendly way. You will learn how to merge PDF files in Python using modern libraries, see multiple code examples, and understand how to make your solution reliable for real-world projects.

Why Merge PDFs in Python?

There are many situations where PDF merging is useful:

Combining several reports into one document
Joining invoice pages into a single file
Merging scanned pages after digitizing paper documents
Building a document automation system
Creating downloadable PDF bundles in web apps
Organizing chapters or sections of an e-book
Preparing files for archiving, emailing, or printing

Python is a strong choice for this because it is easy to read, easy to maintain, and works well with file automation. With the right library, you can create scripts that merge files locally, on a server, or inside a larger application.

Which Python Library Should You Use?

For PDF merging, the most common modern choice is pypdf.

It is the updated successor of the older PyPDF2 project and is widely used for reading, writing, splitting, merging, and modifying PDF files.

Install it

pip install pypdf

If you are working in a virtual environment, activate it first, then install the package.

Basic PDF Merge in Python

The simplest way to merge PDF files is to use PdfMerger from pypdf.

Example: merge two PDF files

from pypdf import PdfMerger

merger = PdfMerger()

merger.append("file1.pdf")
merger.append("file2.pdf")

merger.write("merged.pdf")
merger.close()

How it works

PdfMerger() creates a merger object
append() adds a PDF to the output
write() saves the final file
close() releases resources

This is the most direct solution when you already know the filenames and want a quick merge.

Merge Multiple PDF Files

You can merge many files by storing them in a list and looping through them.

Example: merge a list of PDFs

from pypdf import PdfMerger

pdf_files = [
    "intro.pdf",
    "chapter1.pdf",
    "chapter2.pdf",
    "chapter3.pdf"
]

merger = PdfMerger()

for pdf in pdf_files:
    merger.append(pdf)

merger.write("book.pdf")
merger.close()

This approach is useful when the order matters, such as when you are assembling chapters, reports, or form pages.

Merge PDFs from a Folder Automatically

In many projects, you will not want to hardcode filenames manually. Instead, you may want to merge every PDF in a folder.

Example: merge all PDFs in a directory

import os
from pypdf import PdfMerger

folder_path = "pdfs"
output_file = "merged_output.pdf"

pdf_files = sorted(
    [
        os.path.join(folder_path, f)
        for f in os.listdir(folder_path)
        if f.lower().endswith(".pdf")
    ]
)

merger = PdfMerger()

for pdf_file in pdf_files:
    merger.append(pdf_file)

merger.write(output_file)
merger.close()

Why use `sorted()`?

Sorting helps keep the merge order predictable. Without sorting, the order returned by the operating system may not match what you expect.

For example, this can matter if your files are named like:

001-cover.pdf
002-introduction.pdf
003-chapter-one.pdf

Sorting ensures they are merged in the correct sequence.

Merge PDFs in a Specific Order

Sometimes the folder contains many PDFs, but you only want to merge selected files in a custom order.

Example: custom merge order

from pypdf import PdfMerger

pdf_files = [
    "cover.pdf",
    "table_of_contents.pdf",
    "chapter_3.pdf",
    "chapter_1.pdf",
    "chapter_2.pdf"
]

merger = PdfMerger()

for pdf in pdf_files:
    merger.append(pdf)

merger.write("custom_order_book.pdf")
merger.close()

You can place files in any order you need. This is particularly useful when your document structure is not alphabetical.

Insert One PDF Into Another at a Specific Position

Sometimes you do not want to append at the end. You may need to insert pages from one file into a specific location.

PdfMerger supports an append() method with a pages parameter, and also a merge() method for inserting a file at a position.

Example: insert a PDF into page position 3

from pypdf import PdfMerger

merger = PdfMerger()

merger.append("main_document.pdf")
merger.merge(3, "inserted_section.pdf")

merger.write("updated_document.pdf")
merger.close()

This is useful when you need to place an appendix, annex, or additional section in the middle of a file.

Merge Only Certain Pages from Each PDF

You do not always need the full PDF. Sometimes you only need a few pages from each file.

Example: merge selected pages

from pypdf import PdfMerger

merger = PdfMerger()

# Merge pages 0 to 2 from the first file
merger.append("file1.pdf", pages=(0, 3))

# Merge only page 4 from the second file
merger.append("file2.pdf", pages=(4, 5))

merger.write("selected_pages.pdf")
merger.close()

Important note about page ranges

In pypdf, page numbers are zero-based:

page 0 = first page
page 1 = second page
page 2 = third page

This is a common source of confusion, so it is worth remembering.

Merge PDFs and Keep Metadata

Sometimes you want to preserve or set metadata for the final document.

Metadata may include:

title
author
subject
keywords
creator

Example: set metadata after merge

from pypdf import PdfMerger

merger = PdfMerger()

merger.append("report1.pdf")
merger.append("report2.pdf")

merger.add_metadata({
    "/Title": "Combined Annual Reports",
    "/Author": "Your Name",
    "/Subject": "Merged PDF documents",
    "/Keywords": "PDF, merge, reports, Python"
})

merger.write("final_report.pdf")
merger.close()

This is especially useful if the merged PDF will be shared publicly or uploaded to a document management system.

A Safer Merge Script with Error Handling

Real-world scripts should handle errors gracefully. Files may be missing, corrupted, encrypted, or inaccessible.

Example: robust merge function

from pypdf import PdfMerger
import os

def merge_pdfs(pdf_files, output_file):
    merger = PdfMerger()

    try:
        for pdf in pdf_files:
            if not os.path.exists(pdf):
                print(f"Skipping missing file: {pdf}")
                continue

            try:
                merger.append(pdf)
                print(f"Added: {pdf}")
            except Exception as e:
                print(f"Could not add {pdf}: {e}")

        if len(merger.pages) == 0:
            print("No pages were added. Output file was not created.")
            return

        merger.write(output_file)
        print(f"Saved merged PDF to: {output_file}")

    finally:
        merger.close()

pdf_files = ["a.pdf", "b.pdf", "missing.pdf", "c.pdf"]
merge_pdfs(pdf_files, "merged_result.pdf")

Why this matters

This kind of script is more reliable when used in production, especially in:

backend services
scheduled jobs
file upload systems
batch document processing

Merge PDFs from User Uploads in a Web App

If you are building a website or API, users may upload PDF files that you want to merge on the server.

A basic workflow might look like this:

Receive uploaded PDF files
Save them temporarily
Merge them with Python
Return the merged file for download
Delete temporary files afterward

Example: merge uploaded files saved on disk

from pypdf import PdfMerger
import tempfile
import os

def merge_uploaded_pdfs(uploaded_paths, output_path):
    merger = PdfMerger()

    try:
        for path in uploaded_paths:
            merger.append(path)

        merger.write(output_path)
    finally:
        merger.close()

# Example usage
uploaded_files = [
    "/tmp/upload1.pdf",
    "/tmp/upload2.pdf",
    "/tmp/upload3.pdf"
]

merge_uploaded_pdfs(uploaded_files, "/tmp/final_merged.pdf")

This approach works well in Flask, Django, FastAPI, or any other Python-based backend.

Merge PDFs in Django

If you are using Django, you may want to merge files after upload and return the merged PDF as a downloadable response.

Example using Django

from django.http import FileResponse
from pypdf import PdfMerger
import os
import tempfile

def merge_pdfs_view(request):
    uploaded_files = request.FILES.getlist("pdf_files")

    if not uploaded_files:
        return FileResponse(open("no_files_error.pdf", "rb"))

    with tempfile.TemporaryDirectory() as temp_dir:
        paths = []

        for file_obj in uploaded_files:
            temp_path = os.path.join(temp_dir, file_obj.name)
            with open(temp_path, "wb+") as destination:
                for chunk in file_obj.chunks():
                    destination.write(chunk)
            paths.append(temp_path)

        output_path = os.path.join(temp_dir, "merged.pdf")

        merger = PdfMerger()
        try:
            for path in paths:
                merger.append(path)
            merger.write(output_path)
        finally:
            merger.close()

        return FileResponse(open(output_path, "rb"), as_attachment=True, filename="merged.pdf")

This is the general pattern for a document upload-and-merge feature.

Merge PDFs in Flask

Flask makes it easy to accept multiple PDF files and return the merged file.

Example using Flask

from flask import Flask, request, send_file
from pypdf import PdfMerger
import os
import tempfile

app = Flask(__name__)

@app.route("/merge", methods=["POST"])
def merge_pdfs():
    files = request.files.getlist("pdf_files")

    if not files:
        return {"error": "No files uploaded"}, 400

    with tempfile.TemporaryDirectory() as temp_dir:
        paths = []

        for file in files:
            file_path = os.path.join(temp_dir, file.filename)
            file.save(file_path)
            paths.append(file_path)

        output_path = os.path.join(temp_dir, "merged.pdf")

        merger = PdfMerger()
        try:
            for path in paths:
                merger.append(path)
            merger.write(output_path)
        finally:
            merger.close()

        return send_file(output_path, as_attachment=True, download_name="merged.pdf")

This pattern is suitable for document tools, SaaS apps, or internal file utilities.

Merge PDFs with pathlib for Cleaner Code

Python’s pathlib module often makes path handling cleaner and more readable.

Example using `pathlib`

from pathlib import Path
from pypdf import PdfMerger

pdf_dir = Path("pdfs")
output_file = Path("merged.pdf")

pdf_files = sorted(pdf_dir.glob("*.pdf"))

merger = PdfMerger()

for pdf_file in pdf_files:
    merger.append(str(pdf_file))

merger.write(str(output_file))
merger.close()

This version is elegant and portable across Windows, macOS, and Linux.

Merge PDFs Recursively from Subfolders

Sometimes PDFs are spread across nested folders. You can use recursive search to collect them.

Example: merge all PDFs in subdirectories

from pathlib import Path
from pypdf import PdfMerger

base_dir = Path("documents")
output_file = Path("merged_all.pdf")

pdf_files = sorted(base_dir.rglob("*.pdf"))

merger = PdfMerger()

for pdf_file in pdf_files:
    merger.append(str(pdf_file))

merger.write(str(output_file))
merger.close()

This can be useful when you are organizing document archives or combining multiple project folders.

Merge PDFs and Remove Temporary Files

When you create temporary files during processing, remember to clean them up.

Example: cleanup after merge

from pypdf import PdfMerger
import tempfile
import os

pdf_paths = []

with tempfile.TemporaryDirectory() as temp_dir:
    # Example temporary files
    for i in range(3):
        path = os.path.join(temp_dir, f"file{i}.pdf")
        pdf_paths.append(path)

    merger = PdfMerger()
    try:
        for pdf in pdf_paths:
            merger.append(pdf)
        merger.write("final.pdf")
    finally:
        merger.close()

Using TemporaryDirectory() is a good way to prevent clutter and avoid leaving unused files on disk.

Merge PDFs and Skip Invalid Files

In some folders, not every file may be a valid PDF. You may find text files, images, or damaged files with a .pdf extension.

Example: validate before merge

from pypdf import PdfMerger, PdfReader
import os

def is_valid_pdf(path):
    try:
        PdfReader(path)
        return True
    except Exception:
        return False

pdf_files = ["a.pdf", "b.pdf", "bad.pdf", "c.pdf"]

merger = PdfMerger()

for pdf in pdf_files:
    if os.path.exists(pdf) and is_valid_pdf(pdf):
        merger.append(pdf)
    else:
        print(f"Skipping invalid file: {pdf}")

merger.write("merged_valid_only.pdf")
merger.close()

This is helpful when you are processing user-uploaded content and want to avoid hard failures.

Merge Encrypted PDFs

Encrypted PDFs require a password. If you try to read or merge them without unlocking, the operation may fail.

Example: decrypt before merge

from pypdf import PdfReader, PdfWriter, PdfMerger

def decrypt_pdf(input_path, password, output_path):
    reader = PdfReader(input_path)
    if reader.is_encrypted:
        reader.decrypt(password)

    writer = PdfWriter()
    for page in reader.pages:
        writer.add_page(page)

    with open(output_path, "wb") as f:
        writer.write(f)

decrypt_pdf("protected.pdf", "mypassword", "unlocked.pdf")

merger = PdfMerger()
merger.append("unlocked.pdf")
merger.append("other.pdf")
merger.write("merged.pdf")
merger.close()

If your workflow includes protected documents, decrypting them first can simplify the merge process.

Merge and Compress Workflow

Merging does not automatically reduce file size. In fact, the output can become larger if the PDFs contain many images or high-resolution scans.

If file size matters, you may need a separate compression step after merging. Compression usually requires another library or external tool, depending on your setup.

A common workflow is:

Merge PDFs
Optimize the combined file
Save or distribute the result

The merge step itself is usually done with pypdf, while compression may involve other tools.

Merge PDFs with Page Reordering

Sometimes you need to build a new PDF by taking pages from different files in a custom order. For example, you may want:

cover page first
appendix pages later
selected chapters from different PDFs

Example: merge files and order pages manually

from pypdf import PdfReader, PdfWriter

writer = PdfWriter()

pdf1 = PdfReader("first.pdf")
pdf2 = PdfReader("second.pdf")

# Add first page of first PDF
writer.add_page(pdf1.pages[0])

# Add second page of second PDF
writer.add_page(pdf2.pages[1])

# Add third page of first PDF
writer.add_page(pdf1.pages[2])

with open("custom_pages.pdf", "wb") as output:
    writer.write(output)

This is not a pure file merge, but it is often the next step when building advanced PDF workflows.

Build a Reusable Merge Function

It is better to wrap the merge logic in a function than repeat the same code many times.

Example: reusable function

from pypdf import PdfMerger

def merge_pdfs(input_files, output_file):
    merger = PdfMerger()
    try:
        for file_path in input_files:
            merger.append(file_path)
        merger.write(output_file)
    finally:
        merger.close()

# Example usage
files = ["one.pdf", "two.pdf", "three.pdf"]
merge_pdfs(files, "output.pdf")

Reusable functions make it easier to:

test your code
integrate it into larger systems
handle different file sets
keep your project clean

Add Logging to Your Merge Script

If your script is used in production, logging is more helpful than printing plain text.

Example: merge with logging

import logging
from pypdf import PdfMerger

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def merge_pdfs(input_files, output_file):
    merger = PdfMerger()
    try:
        for file_path in input_files:
            logger.info("Adding %s", file_path)
            merger.append(file_path)

        merger.write(output_file)
        logger.info("Saved merged PDF to %s", output_file)
    except Exception as e:
        logger.exception("Failed to merge PDFs: %s", e)
        raise
    finally:
        merger.close()

merge_pdfs(["a.pdf", "b.pdf"], "merged.pdf")

Logging is especially useful when the script runs in a server, cron job, or background worker.

Common Mistakes When Merging PDFs in Python

Here are some mistakes that often cause problems:

1. Forgetting to close the merger

Always call close() or use try/finally. Otherwise, file handles may remain open.

2. Wrong page numbering

PDF libraries often use zero-based indexing. The first page is page 0, not page 1.

3. Merging in the wrong order

Always sort or explicitly define the order of files before merging.

4. Using invalid file paths

Make sure paths are correct and files exist before processing them.

5. Trying to merge corrupted PDFs

Some files may look like PDFs but be broken or incomplete.

6. Forgetting about encrypted files

Protected PDFs may require a password before they can be merged.

7. Assuming merge reduces file size

Merging only combines files. It does not compress them automatically.

A Full Command-Line Merge Script

Here is a practical script that merges multiple PDFs from the command line.

Example: command-line tool

import sys
from pypdf import PdfMerger

def merge_pdfs(output_file, input_files):
    merger = PdfMerger()
    try:
        for pdf in input_files:
            merger.append(pdf)
        merger.write(output_file)
    finally:
        merger.close()

if __name__ == "__main__":
    if len(sys.argv) < 4:
        print("Usage: python merge_pdfs.py output.pdf input1.pdf input2.pdf [input3.pdf ...]")
        sys.exit(1)

    output = sys.argv[1]
    inputs = sys.argv[2:]

    merge_pdfs(output, inputs)
    print(f"Merged {len(inputs)} PDFs into {output}")

Run it like this

python merge_pdfs.py merged.pdf a.pdf b.pdf c.pdf

This is a simple but very useful utility for everyday work.

A Complete Folder-Merge Script

Here is a polished script for merging all PDFs in a folder into one output file.

Example

from pathlib import Path
from pypdf import PdfMerger

def merge_folder(folder_path, output_file):
    folder = Path(folder_path)
    pdf_files = sorted(folder.glob("*.pdf"))

    if not pdf_files:
        print("No PDF files found.")
        return

    merger = PdfMerger()
    try:
        for pdf in pdf_files:
            print(f"Adding {pdf.name}")
            merger.append(str(pdf))
        merger.write(output_file)
        print(f"Created {output_file}")
    finally:
        merger.close()

merge_folder("pdfs", "merged_folder_output.pdf")

This script is easy to adapt for real projects.

When to Use PyPDF and When to Use Something Else

pypdf is excellent for merging, splitting, page extraction, and metadata handling. It is usually the first library to try.

However, depending on your project, you may also need:

OCR tools for scanned documents
compression utilities for reducing file size
rendering tools for previewing pages
text extraction tools for reading content

For the merge operation itself, pypdf is usually enough.

Final Thoughts

Merging PDF files in Python is simple at the basic level, but it becomes much more powerful when you know how to handle real-world requirements such as folder processing, custom order, page selection, metadata, errors, and encrypted files.

The pypdf library gives you a clean and flexible way to combine documents with only a few lines of code. From small scripts to full document-processing systems, it is a practical tool that can save time and automate repetitive work.

If you are building a PDF utility, the merge feature is one of the most important features to implement first. It is useful, fast to develop, and easy to integrate into web apps, desktop apps, and automation scripts.

Quick Reference Example

Here is the shortest complete example again:

from pypdf import PdfMerger

merger = PdfMerger()
merger.append("file1.pdf")
merger.append("file2.pdf")
merger.write("merged.pdf")
merger.close()

That is all you need for a basic merge.

Tags: #merge-pdf

Hassan Agmir

Author · Filenewer

Writing about file tools and automation at Filenewer.

Try It Free

Process your files right now

No account needed · Fast & secure · 100% free

Browse All Tools