How to Split PDF in Python

Working with PDF files is a very common task in Python projects, especially when you need to automate document processing, organize large reports, extract only specific parts of a file, or prepare downloadable content for users. One of the most useful PDF operations is splitting a PDF into smaller files. A single PDF may contain dozens, hundreds, or even thousands of pages, and splitting it into smaller chunks can make it easier to manage, share, store, and process. In many real-world applications, splitting PDFs is not just a convenience. It is often a necessary step in building document workflows, report systems, archive tools, invoice processors, legal document handlers, and educational platforms.

Python is a great choice for this job because it has several solid libraries that can read and manipulate PDF documents without requiring complex desktop software. Among the most popular libraries are pypdf and PyMuPDF, and each one has strengths depending on what kind of splitting you need. Some developers only need to split a PDF by page ranges. Others need to split every page into a separate file. Others may want to split based on bookmarks, file size, or detected content. Python can handle all of these cases with clean, reusable code.

In this article, you will learn how to split PDFs in Python in different ways, how to install the required libraries, and how to build practical scripts that you can use in your own projects. The examples are written in a simple style so you can adapt them easily whether you are building a backend tool, a desktop utility, or a web application. You will also see how to handle errors, how to make your scripts safer, and how to think about PDF splitting in a way that works well for production code.

Why Split a PDF?

Before jumping into the code, it helps to understand why splitting PDFs matters. A PDF file is often used as a final document format because it preserves layout, fonts, images, and structure across devices. However, a single PDF can become unwieldy when it contains multiple unrelated sections. Imagine a company exporting all monthly invoices into one enormous PDF file. Sending that file to a client might be awkward, and processing it later could be even harder. Splitting the file into separate invoice documents makes the data much easier to manage.

Another common case is page extraction. You may have a report or book where only one chapter or only a few pages are needed. Instead of manually selecting those pages every time, you can automate the process with Python. In a legal or administrative workflow, splitting may be needed to separate case files, form pages, or signed documents. In an education platform, it may be useful to divide a large syllabus or textbook into chapter-based files. In short, PDF splitting is one of those small technical tasks that can save a lot of time when done properly.

Choosing the Right Python Library

There are several libraries available for PDF manipulation, but two of the most practical options are pypdf and PyMuPDF.

pypdf is a pure Python library that is widely used for reading, merging, splitting, and transforming PDF files. It is easy to install and simple to understand, which makes it a good fit for most basic splitting tasks. If your goal is to extract pages, split by range, or create separate PDFs from selected pages, pypdf is often the best place to start.

PyMuPDF is another powerful library, also known as fitz. It offers more advanced PDF and document handling features. It can be very fast and is especially useful when you need to inspect page content, render pages, or do content-aware operations. For splitting by page count or page selection, it works very well.

For this guide, the main examples will use pypdf because it is straightforward and readable. Later, you will also see a PyMuPDF version so you can compare approaches.

Installing the Required Library

To use pypdf, install it with pip:

pip install pypdf

If you want to use PyMuPDF, install it like this:

pip install pymupdf

Once installed, you can import the library and start working with PDF files immediately.

Basic Idea of PDF Splitting

Splitting a PDF usually means one of the following:

Extracting a single page into its own file.
Extracting a range of pages into a smaller file.
Splitting every page into separate PDFs.
Splitting a PDF into multiple files based on a fixed page count.
Splitting according to logical rules such as bookmarks or content sections.

The general workflow is simple. First, open the source PDF. Then read its pages. Then create one or more new PDF files and copy the selected pages into them. Finally, save the new files to disk.

The key concept is page copying. A PDF is not just a text file. It contains page objects, images, fonts, and metadata. So when you split a PDF, you are not merely copying text. You are creating a new PDF structure that contains the selected page objects in the right order.

Split a PDF into Single Pages with `pypdf`

One of the most common tasks is saving each page of a PDF as a separate file. This is useful when every page represents a different form, document, or record.

Here is a simple example:

from pypdf import PdfReader, PdfWriter
from pathlib import Path

def split_pdf_to_single_pages(input_pdf, output_dir):
    input_path = Path(input_pdf)
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    reader = PdfReader(str(input_path))

    for page_number, page in enumerate(reader.pages, start=1):
        writer = PdfWriter()
        writer.add_page(page)

        output_file = output_path / f"page_{page_number}.pdf"
        with open(output_file, "wb") as f:
            writer.write(f)

        print(f"Created: {output_file}")

# Example usage
split_pdf_to_single_pages("document.pdf", "split_pages")

This script reads the input PDF, loops through each page, creates a new PDF containing just that page, and saves it in a folder. The folder is created automatically if it does not exist.

This is one of the easiest and most reliable ways to split a PDF. It is also easy to customize. For example, you can change the file naming pattern, add leading zeros, or include metadata in the file name.

Split a PDF by Page Range

Sometimes you do not want every page as a separate file. Instead, you may want a smaller PDF that contains only selected pages. This is very useful when extracting a chapter, a section, or a specific group of pages.

from pypdf import PdfReader, PdfWriter

def split_pdf_by_range(input_pdf, output_pdf, start_page, end_page):
    reader = PdfReader(input_pdf)
    writer = PdfWriter()

    # Page numbers here are zero-based in Python, so we subtract 1
    for page_num in range(start_page - 1, end_page):
        writer.add_page(reader.pages[page_num])

    with open(output_pdf, "wb") as f:
        writer.write(f)

    print(f"Saved range {start_page}-{end_page} to {output_pdf}")

# Example usage
split_pdf_by_range("document.pdf", "pages_3_to_7.pdf", 3, 7)

This function uses a start page and end page based on normal human counting, where page 1 is the first page. Internally, Python uses zero-based indexing, so the code subtracts 1 from the start page.

A page-range splitter is especially useful in workflows where users upload a large PDF and request only a portion of it. It can also be used in content publishing systems to extract previews, samples, or excerpts.

Split a PDF into Chunks of Fixed Size

Another very practical method is to split a PDF into groups of pages. For example, if a PDF has 100 pages and you want files of 10 pages each, you can create 10 smaller PDFs. This is useful for archiving, uploading, or sending large documents in parts.

from pypdf import PdfReader, PdfWriter
from pathlib import Path
import math

def split_pdf_by_chunks(input_pdf, output_dir, pages_per_chunk=10):
    input_path = Path(input_pdf)
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    reader = PdfReader(str(input_path))
    total_pages = len(reader.pages)
    total_chunks = math.ceil(total_pages / pages_per_chunk)

    for chunk_index in range(total_chunks):
        writer = PdfWriter()
        start = chunk_index * pages_per_chunk
        end = min(start + pages_per_chunk, total_pages)

        for page_num in range(start, end):
            writer.add_page(reader.pages[page_num])

        output_file = output_path / f"chunk_{chunk_index + 1}.pdf"
        with open(output_file, "wb") as f:
            writer.write(f)

        print(f"Created {output_file} with pages {start + 1} to {end}")

# Example usage
split_pdf_by_chunks("big_report.pdf", "chunks", pages_per_chunk=5)

This approach is useful when the PDF is too large to process in one block or when you want a predictable output structure. It is also helpful in systems that limit file size or number of pages per upload.

Split a PDF from a List of Specific Pages

Sometimes you need more control than a simple range. You may want pages 1, 3, 7, and 10 together in one file. Python makes that easy.

from pypdf import PdfReader, PdfWriter

def split_pdf_by_page_list(input_pdf, output_pdf, page_numbers):
    reader = PdfReader(input_pdf)
    writer = PdfWriter()

    for page_number in page_numbers:
        writer.add_page(reader.pages[page_number - 1])

    with open(output_pdf, "wb") as f:
        writer.write(f)

    print(f"Saved selected pages to {output_pdf}")

# Example usage
split_pdf_by_page_list("document.pdf", "selected_pages.pdf", [1, 3, 7, 10])

This is perfect when the PDF pages are not in a neat sequence and you need to build a custom output file. For example, you may want to combine a cover page, appendix pages, and a special section into one document.

Add Error Handling for Safer Scripts

In real applications, PDF files may be missing, damaged, password-protected, or structured in unexpected ways. A good script should handle those cases gracefully rather than crashing without explanation.

Here is a more robust example:

from pypdf import PdfReader, PdfWriter
from pathlib import Path

def split_pdf_safe(input_pdf, output_pdf, start_page, end_page):
    try:
        reader = PdfReader(input_pdf)

        if reader.is_encrypted:
            print("The PDF is encrypted and cannot be processed without a password.")
            return

        total_pages = len(reader.pages)

        if start_page < 1 or end_page > total_pages or start_page > end_page:
            print("Invalid page range.")
            return

        writer = PdfWriter()

        for page_num in range(start_page - 1, end_page):
            writer.add_page(reader.pages[page_num])

        output_path = Path(output_pdf)
        output_path.parent.mkdir(parents=True, exist_ok=True)

        with open(output_path, "wb") as f:
            writer.write(f)

        print(f"Saved: {output_pdf}")

    except FileNotFoundError:
        print("Input PDF file not found.")
    except Exception as e:
        print(f"An error occurred: {e}")

# Example usage
split_pdf_safe("document.pdf", "output/slice.pdf", 2, 6)

This version checks for encrypted files, validates the page range, and protects the script from simple file errors. In a production environment, this kind of validation is very important because it improves reliability and helps users understand what went wrong.

Split a Password-Protected PDF

Sometimes a PDF is encrypted with a password. In that case, pypdf can still open it if you provide the correct password.

from pypdf import PdfReader, PdfWriter

def split_encrypted_pdf(input_pdf, output_pdf, password, start_page, end_page):
    reader = PdfReader(input_pdf)

    if reader.is_encrypted:
        result = reader.decrypt(password)
        if result == 0:
            print("Incorrect password.")
            return

    writer = PdfWriter()

    for page_num in range(start_page - 1, end_page):
        writer.add_page(reader.pages[page_num])

    with open(output_pdf, "wb") as f:
        writer.write(f)

    print(f"Saved decrypted split PDF to {output_pdf}")

# Example usage
split_encrypted_pdf("protected.pdf", "decrypted_part.pdf", "mypassword", 1, 4)

This is useful in business systems where secure PDFs are common. Be careful with encrypted files, and make sure you have permission to access and process the document.

Split a PDF Using `PyMuPDF`

Although pypdf is excellent for many cases, PyMuPDF is also worth knowing. It can be fast and flexible, especially for document analysis and rendering.

Here is a basic example of splitting by page:

import fitz  # PyMuPDF

def split_pdf_with_pymupdf(input_pdf, output_dir):
    doc = fitz.open(input_pdf)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    for i in range(doc.page_count):
        new_doc = fitz.open()
        new_doc.insert_pdf(doc, from_page=i, to_page=i)
        output_file = output_dir / f"page_{i + 1}.pdf"
        new_doc.save(output_file)
        new_doc.close()
        print(f"Created: {output_file}")

    doc.close()

# Example usage
split_pdf_with_pymupdf("document.pdf", "pymupdf_pages")

The structure is similar to pypdf, but the API is different. PyMuPDF can be a good choice when you are already using it for other PDF tasks, or when you need better performance in document-heavy workflows.

Split PDF Pages Based on an Index Pattern

Sometimes a document has a logical structure, and you want to split it into parts based on a pattern. For example, every 12 pages may represent one record, one invoice batch, or one student file. In that case, you can loop through the document in steps.

from pypdf import PdfReader, PdfWriter
from pathlib import Path

def split_pdf_every_n_pages(input_pdf, output_dir, n):
    reader = PdfReader(input_pdf)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    total_pages = len(reader.pages)

    part_number = 1
    for start in range(0, total_pages, n):
        writer = PdfWriter()
        end = min(start + n, total_pages)

        for page_num in range(start, end):
            writer.add_page(reader.pages[page_num])

        output_file = output_dir / f"part_{part_number}.pdf"
        with open(output_file, "wb") as f:
            writer.write(f)

        print(f"Saved {output_file}")
        part_number += 1

# Example usage
split_pdf_every_n_pages("book.pdf", "book_parts", 12)

This technique is excellent when you have a repeated document pattern and need a fixed-size output structure. It can also reduce the amount of manual checking later.

Build a Command-Line PDF Splitter

If you want to turn your script into a useful command-line tool, you can use argparse. This allows users to run the program with arguments instead of editing the code each time.

import argparse
from pypdf import PdfReader, PdfWriter
from pathlib import Path

def split_pdf(input_pdf, output_dir, pages_per_file):
    reader = PdfReader(input_pdf)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    total_pages = len(reader.pages)

    file_index = 1
    for start in range(0, total_pages, pages_per_file):
        writer = PdfWriter()
        end = min(start + pages_per_file, total_pages)

        for page_num in range(start, end):
            writer.add_page(reader.pages[page_num])

        output_file = output_dir / f"split_{file_index}.pdf"
        with open(output_file, "wb") as f:
            writer.write(f)

        print(f"Created {output_file}")
        file_index += 1

def main():
    parser = argparse.ArgumentParser(description="Split a PDF into smaller files.")
    parser.add_argument("input_pdf", help="Path to the input PDF")
    parser.add_argument("output_dir", help="Directory to save split files")
    parser.add_argument("--pages", type=int, default=1, help="Pages per output file")

    args = parser.parse_args()
    split_pdf(args.input_pdf, args.output_dir, args.pages)

if __name__ == "__main__":
    main()

You can run it like this:

python split_pdf.py document.pdf output_folder --pages 3

This is a clean and practical way to package your code for repeated use.

Split PDF in a Web App or Backend Service

Many developers do not want a local script only. They want PDF splitting inside a web application, API, or backend service. In that case, the same logic can be wrapped in a route handler, task worker, or file processing pipeline.

For example, a Flask or FastAPI endpoint could receive an uploaded PDF, validate it, split it by the user’s chosen rule, and return the new files as downloads. The logic stays mostly the same; only the input and output handling changes. In a background job system, the split task could run asynchronously so that large files do not block the main request. In both cases, it is a good idea to store files in temporary directories and clean them up after processing.

A web-based PDF splitter should also protect against very large uploads, invalid file types, and malicious input. Since PDFs are common upload targets, strict validation is important. It is also wise to limit the number of pages or the total file size that users can process in one request.

Keep File Names Organized

One small detail that matters a lot in real projects is naming. Good file naming makes the output easier to sort and understand. For example, instead of using names like page1.pdf, page2.pdf, and page10.pdf, you might want to use zero-padded names such as page_001.pdf, page_002.pdf, and page_010.pdf. This keeps files in the correct order when listed alphabetically.

Here is a simple example:

output_file = output_path / f"page_{page_number:03d}.pdf"

This format creates three-digit page numbers. If the document is very large, you can increase the width to four digits or more.

Useful Tips for Better PDF Splitting

When you split PDFs in Python, a few practical habits can make your code much better. First, always check the page count before splitting. This avoids out-of-range errors. Second, handle encrypted files separately, because they need passwords or may fail to open. Third, make sure your output directory exists before writing files. Fourth, close documents properly, especially when using libraries like PyMuPDF. Fifth, keep the page indexing logic clear, because many mistakes come from mixing human page numbers with Python indexes.

Another helpful tip is to isolate splitting logic into small functions. That way, you can reuse the same function in scripts, APIs, and command-line tools. It also makes debugging easier. Finally, test your code on several documents, including a small PDF, a large PDF, a PDF with blank pages, and a password-protected PDF. Real files can behave differently, and the earlier you test those differences, the easier it is to build stable software.

Common Mistakes to Avoid

A common mistake is forgetting that Python uses zero-based indexing while most users think in one-based page numbers. That means page 1 is actually reader.pages[0]. Another mistake is not checking whether the end page exceeds the document length. Some scripts also fail because the output directory does not exist. Another frequent issue is trying to split an encrypted PDF without decrypting it first. Finally, some developers assume that all PDFs are perfectly structured, but in reality, PDFs can be messy, scanned, or malformed.

It is also easy to overlook memory use when processing large documents. If you are splitting very large PDFs, you may want to be careful about loading unnecessary data or holding many document objects in memory at once. In many cases, pypdf handles common tasks well, but large-scale systems should still be tested under realistic conditions.

A Practical Example: Split a PDF into 2-Page Files

Here is a compact but practical example that creates small files with two pages each:

from pypdf import PdfReader, PdfWriter
from pathlib import Path

def split_into_two_page_files(input_pdf, output_dir):
    reader = PdfReader(input_pdf)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    total_pages = len(reader.pages)
    part_number = 1

    for start in range(0, total_pages, 2):
        writer = PdfWriter()
        end = min(start + 2, total_pages)

        for page_num in range(start, end):
            writer.add_page(reader.pages[page_num])

        output_file = output_dir / f"part_{part_number}.pdf"
        with open(output_file, "wb") as f:
            writer.write(f)

        print(f"Saved {output_file}")
        part_number += 1

split_into_two_page_files("sample.pdf", "two_page_parts")

This is simple enough to understand at a glance, but it is also very useful in practice. It could be used for splitting a PDF booklet, sending smaller preview files, or dividing sections for review.

A More Flexible Reusable Function

Sometimes the best solution is a general function that can handle multiple splitting strategies. Here is a reusable version that accepts a custom list of page groups:

from pypdf import PdfReader, PdfWriter
from pathlib import Path

def split_pdf_by_groups(input_pdf, output_dir, page_groups):
    reader = PdfReader(input_pdf)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    for i, group in enumerate(page_groups, start=1):
        writer = PdfWriter()

        for page_number in group:
            if page_number < 1 or page_number > len(reader.pages):
                raise ValueError(f"Invalid page number: {page_number}")
            writer.add_page(reader.pages[page_number - 1])

        output_file = output_dir / f"group_{i}.pdf"
        with open(output_file, "wb") as f:
            writer.write(f)

        print(f"Created {output_file}")

# Example usage
split_pdf_by_groups(
    "document.pdf",
    "groups",
    [
        [1, 2, 3],
        [5, 6],
        [8, 10, 11]
    ]
)

This kind of function is useful when the page structure is not uniform. It gives you complete control over the output files and can be adapted to many document-processing scenarios.

Final Thoughts

Splitting PDFs in Python is a practical and valuable skill. It is one of those tasks that looks simple at first, but becomes very powerful once you start using it in real projects. A few lines of code can save a great deal of manual work. Whether you want to split every page into a separate file, extract a page range, divide a document into fixed chunks, or build a more advanced PDF processing tool, Python gives you the flexibility to do it cleanly.

The most important part is choosing the method that matches your use case. For straightforward page extraction, pypdf is an excellent choice. For more advanced PDF handling, PyMuPDF can be a strong option. Once you understand the basic pattern of reading pages and writing selected pages into a new file, you can build almost any splitting workflow you need.

If you are building a backend tool, a document automation system, or a simple utility script, PDF splitting is a great feature to add. It improves organization, speeds up processing, and makes large files easier to manage. With the examples in this guide, you now have a solid foundation for handling PDF splitting in Python in a way that is practical, reusable, and ready for real-world use.

Tags: #split-pdf

Hassan Agmir

Author · Filenewer

Writing about file tools and automation at Filenewer.

Try It Free

Process your files right now

No account needed · Fast & secure · 100% free

Browse All Tools