Glossary Term
Document Scanning
Converting a physical document — paper, receipt, ID, or printed page — into a digital image or PDF using a scanner, phone camera, or dedicated scanning app.
Scanning vs photographing
Taking a photo of a document and scanning it may seem similar, but the results differ significantly.
A photograph captures the document from a single angle with whatever ambient lighting is available. The result often includes perspective distortion (the document appears trapezoidal rather than rectangular), uneven illumination (bright spots and shadows), curved edges from pages that are not perfectly flat, and background clutter around the document's edges.
Scanning — whether with a flatbed scanner or a software-based mobile scanning app — applies processing to correct these issues. Perspective correction straightens the document to a true rectangle. Automatic cropping removes the background. Contrast enhancement evens out lighting differences and sharpens text against the page. The result resembles what a flatbed scanner would produce: a clean, flat, properly oriented document image.
This distinction matters for downstream use. A photograph of a document may be legible to a human reader, but its inconsistencies create problems for OCR processing, automated filing, and archival. A properly scanned document processes more reliably and looks more professional when shared or printed.
Where document scanning is used
- Receipt and expense tracking — scanning receipts immediately after purchase creates a digital record before the thermal paper fades. Scanned receipts are easier to organize, search, and submit for reimbursement than physical copies.
- Contract and legal document digitization — converting signed agreements, leases, and legal documents into searchable PDFs ensures they are accessible, backed up, and findable by keyword.
- ID and credential capture — scanning passports, driver's licenses, and credentials for verification workflows, application processes, and secure storage.
- Medical and insurance records — digitizing paper forms, lab results, and insurance cards for personal record-keeping and secure sharing with healthcare providers.
- Academic and research materials — scanning book pages, journal articles, and handwritten notes for digital annotation, citation, and organization.
Scanning and OCR
A scan by itself produces an image — a picture of text, not actual text data. The document looks like text to a human viewer, but to a computer, it is simply an arrangement of pixels.
OCR (Optical Character Recognition) bridges this gap by analyzing the scanned image, identifying character shapes, and converting them into machine-readable text. When OCR output is embedded as a text layer in a PDF, the result is a searchable PDF: visually identical to the original scan, but with selectable, searchable, and copyable text underneath.
The quality of OCR results depends heavily on scan quality. Clear, high-contrast scans at 300 DPI or higher produce accurate text recognition. Blurry, low-contrast, or skewed scans generate errors that require manual correction.
Workflows that combine scanning and OCR in a single pipeline — capture the document, process it, and output a searchable PDF — eliminate the manual step of running OCR separately. This end-to-end approach is particularly valuable when processing multiple documents in sequence, turning a stack of paper into a searchable digital archive.
Common mistakes
- Scanning in low resolution. Scanning below 200 DPI produces images where text may be legible to humans but difficult for OCR to process accurately. Use 300 DPI as the minimum for documents that will be processed or archived.
- Not straightening before scanning. A crooked document produces a crooked scan. Many scanning apps correct minor rotation automatically, but starting with a properly aligned document produces the best results.
- Saving as JPG for archival. JPG's lossy compression degrades text sharpness, especially at lower quality settings. For document archival, use PNG (single pages) or PDF (multi-page) to preserve text clarity.
- Skipping OCR on scanned PDFs. A PDF containing only scanned images cannot be searched, selected, or indexed. Always run OCR if the document will need to be searched or if text will need to be extracted later.
Common Questions
Can I scan documents with my phone?
Yes. Modern phone cameras combined with scanning apps can produce results comparable to dedicated scanners for most everyday documents. The app handles perspective correction, cropping, and contrast enhancement automatically.
What is the difference between scanning and photographing a document?
Photographing captures whatever the camera sees — including perspective distortion, shadows, and uneven lighting. Scanning applies processing to correct these issues, producing a flat, evenly lit, properly cropped result that looks like a true scan.
What format should I save scanned documents in?
PDF is the standard for multi-page documents because it preserves layout and supports text layers for searchability. Single-page scans can be saved as PNG for lossless quality or JPG for smaller file sizes.
Does scanning a document make it searchable?
Not automatically. A basic scan produces an image — a picture of the text, not actual text data. To make it searchable, the scan must be processed with OCR (Optical Character Recognition), which detects and converts the visual text into a machine-readable text layer.
What resolution should I scan documents at?
300 DPI is the standard for most documents and provides enough detail for OCR processing and readable output. Use 600 DPI for documents with fine print or when you need to preserve maximum detail. Higher resolutions increase file size substantially with diminishing returns.