Glossary Term
Tagged PDF
A tagged PDF is a PDF that contains structural tags — headings, paragraphs, lists, tables — that define the reading order and semantic structure for screen readers and assistive technology.
Tagged vs untagged PDF
An untagged PDF contains content — text, images, vectors — arranged visually on the page. The PDF viewer renders the content in the correct visual positions, and a sighted reader sees a well-formatted document. But the file contains no information about the logical structure of that content. It does not know which text is a heading, which is a paragraph, or in what order the content should be read.
A tagged PDF adds a layer of structural metadata. Each content element is wrapped in a tag that identifies its role: <H1> for a top-level heading, <P> for a paragraph, <Table> for a table, <L> for a list, <Figure> for an image. These tags also define the reading order — the sequence in which a screen reader should traverse the content.
This distinction matters most for accessibility. Without tags, a screen reader may read content in the wrong order, skip important elements, or fail to convey the structure of the document — turning a well-organized report into an incoherent stream of text.
Why tagging matters for accessibility
Screen readers and other assistive technologies rely on structural information to present documents in a meaningful way. Tags provide that information.
With proper tags, a screen reader can:
- Announce headings — allowing the user to navigate by heading level, jumping directly to the section they need
- Read tables correctly — associating header cells with data cells so the user understands which column or row each value belongs to
- Describe images — using alt text attached to figure tags to convey the content of images that the user cannot see
- Follow reading order — traversing content in the intended logical sequence rather than the visual layout order, which may differ in multi-column or complex page designs
- Skip navigation — using the tag structure to jump between sections, lists, and other landmarks
Without tags, the screen reader falls back to raw text extraction, which often produces garbled or out-of-order output — especially in documents with columns, sidebars, headers, footers, or floating elements.
How tagged PDFs are created
There are several paths to producing a tagged PDF:
- Authoring tools with export support — word processors like Microsoft Word, Google Docs, and LibreOffice can produce tagged PDFs during export, provided the document uses proper heading styles, alt text for images, and structured lists. The export settings must include the option to generate tags.
- Desktop publishing tools — Adobe InDesign and similar layout tools support tag export. The designer maps content elements to tags during the layout process.
- PDF remediation tools — if a PDF was created without tags, tools like Adobe Acrobat Pro can add tags after the fact. This involves manually or semi-automatically identifying each content element and assigning the correct tag.
- Automated pipelines — document processing workflows can apply tagging programmatically using libraries that parse the content structure and generate tags during PDF creation.
The quality of the tags depends on the quality of the source document. A well-structured Word document with proper headings produces good tags automatically. A flat, unstructured document requires manual remediation.
For screenshot-derived PDFs, tagging usually comes after OCR, not before. Once the file has machine-readable text, a tool can start assigning headings, paragraphs, and figures in a way assistive technology can actually use.
Common mistakes with tagged PDFs
- Assuming a searchable PDF is also tagged. A searchable PDF has machine-readable text but may have no structural tags. Search and accessibility are separate capabilities.
- Using visual formatting instead of styles. Making text bold and large does not create a heading tag. The authoring tool needs to recognize the element as a heading — which requires using the heading style, not just visual formatting.
- Ignoring table structure. Complex tables with merged cells, nested headers, or spanning rows often produce broken tags. Simplify table layouts or manually verify the tag structure after export.
- Skipping alt text for images. A tagged PDF with figure tags but no alt text still fails the accessibility requirement. Every meaningful image needs a text description; decorative images should be tagged as artifacts so screen readers skip them.
Common Questions
What is the difference between a tagged PDF and a searchable PDF?
A searchable PDF contains machine-readable text that enables search and selection. A tagged PDF goes further — it also contains structural tags that define headings, paragraphs, lists, tables, and reading order for assistive technology.
How do I check if a PDF is tagged?
In Adobe Acrobat, go to File > Properties and look for the 'Tagged PDF' field under the Description tab. In most viewers, you can also check the document properties or metadata panel.
Are all PDFs created from Word documents tagged?
Not automatically. Microsoft Word can produce tagged PDFs if the export settings include accessibility tags, but this option must be enabled. The document also needs to use proper heading styles and structure for the tags to be meaningful.
Do tagged PDFs affect how the document looks?
No. Tags are metadata embedded in the file structure. They do not change the visual appearance of the document. They define the semantic structure that assistive technology uses to interpret and navigate the content.
Is tagging required by law?
In many jurisdictions, yes. Regulations like Section 508 (US), the European Accessibility Act, and WCAG-based procurement standards require that digital documents shared publicly by government agencies and certain organizations be accessible, which typically means tagged.