Glossary Term

PDF Compression

Reducing the file size of a PDF document — by compressing embedded images, removing unused data, or optimizing the internal structure — while preserving readability.

What makes PDFs large

PDF file size is driven almost entirely by embedded content, and the largest contributor is usually images.

A PDF that contains only text and vector graphics (lines, shapes, fonts) is inherently small. Text is stored as character codes and positioning data, and fonts can be subsetted to include only the characters actually used. A fifty-page text document with no images might be just a few hundred kilobytes.

Images change this dramatically. A single high-resolution photograph embedded at full quality can add 5-10 MB. Scanned documents are particularly heavy because each page is a full-page raster image — a ten-page scanned document at 300 DPI might weigh 50 MB or more.

Other contributors to large file sizes include embedded fonts (especially when full fonts are included instead of subsets), duplicate resources (the same image embedded multiple times), unused objects left over from editing, and metadata or annotations that accumulate over multiple revisions.

Understanding what makes a specific PDF large is the first step toward effective compression. Compressing a PDF that is large because of images requires different techniques than compressing one that is large because of redundant internal structure.

Compression techniques

Several approaches can reduce PDF file size, and the most effective strategy often combines multiple techniques.

  • Image downsampling — reducing the resolution of embedded images to match their display size. An image inserted at 3000x2000 pixels but displayed at 300x200 in the document contains ten times more data than needed. Downsampling to the display resolution eliminates this waste.
  • Image recompression — converting embedded images to more efficient formats or applying lossy compression. Uncompressed TIFF images embedded in a PDF can be recompressed to JPG or JPEG 2000 at significant size savings.
  • Font subsetting — replacing full embedded fonts with subsets containing only the characters used in the document. A full font file might be 500 KB; a subset with just the characters in a specific document might be 30 KB.
  • Object deduplication — identifying and removing duplicate resources. If the same logo appears on every page, it should be stored once and referenced multiple times rather than embedded separately on each page.
  • Metadata and structure cleanup — removing editing history, unused form fields, JavaScript, and other non-essential data that accumulates during document creation and editing.

Tools that process PDFs through an optimization pipeline can apply all of these techniques in a single pass, producing a smaller file without manual intervention.

Compression and quality

The relationship between PDF compression and visual quality depends entirely on which compression techniques are applied.

Structural optimizations — font subsetting, object deduplication, metadata removal — have zero impact on visual quality. The document looks identical; only the internal representation becomes more efficient. These techniques should always be applied.

Image compression is where quality trade-offs appear. Lossy recompression of embedded images reduces their visual fidelity. At moderate settings, the difference is imperceptible in most viewing conditions. At aggressive settings, text in scanned documents may become harder to read, and photographs may show visible artifacts.

The critical factor is the document's purpose. A PDF intended for on-screen viewing can tolerate more aggressive image compression because screen resolution limits how much detail is visible. A PDF intended for print should preserve higher image quality to avoid visible degradation in the output.

For PDFs that combine text content (vector) with embedded screenshots or images (raster), compression affects only the raster portions. The text remains perfectly sharp regardless of how aggressively the images are compressed.

Common mistakes

  • Compressing without checking the result. Always compare the compressed PDF against the original. Verify that text is readable, images are acceptable, and no content has been lost or corrupted.
  • Using maximum compression for scanned documents. Scanned pages are entirely raster images, and aggressive compression makes text noticeably blurry. Use moderate quality settings and rely on resolution reduction rather than heavy lossy compression.
  • Embedding full-resolution images unnecessarily. If an image displays at 2 inches wide in the document, it does not need to be 4000 pixels wide. Match embedded image resolution to the document's intended output resolution.
  • Not compressing before sharing. Large PDFs fail email attachment limits, slow down cloud sharing, and consume unnecessary storage. Run a compression pass before distributing any PDF that contains images or scanned content.

Common Questions

Why is my PDF file so large?

The most common cause is embedded images, especially if they were inserted at full resolution or saved in uncompressed formats. A single uncompressed photograph can add several megabytes. Scanned documents are essentially full-page images and can be very large.

Does compressing a PDF reduce its quality?

It depends on the technique. Removing unused metadata and optimizing structure has no effect on visual quality. Compressing embedded images with lossy methods reduces image quality. The key is choosing settings that reduce size without making text or images noticeably worse.

Can I compress a PDF without losing text quality?

Yes. Text in a PDF is stored as vector data and font information, which takes very little space and is not affected by image compression. Quality loss from PDF compression applies only to embedded raster images, not to text content.

What is a good file size for a PDF?

It depends on content. A text-only document should be under 100 KB. A document with a few images should be 1-5 MB. A heavily illustrated or scanned document might be 5-20 MB. If your PDF exceeds these ranges substantially, it likely has room for compression.

How much can PDF compression reduce file size?

Results vary widely. A PDF with uncompressed high-resolution images can often be reduced by 70-90%. A PDF that is already well-optimized may shrink by only 10-20%. The potential depends on what is making the file large in the first place.

Sources

Related Resources