A-PDF

Introduction

Accessible PDF refers to the tagged PDF files. It renders a structured representation of the PDF content which is presented to the screen readers. It endures for accessibility purposes. Some common HTML tags were used in the PDF tags (e.g., H1, P, Table). Acrobat Pro is the only version of Acrobat that can be used to view and edit the accessibility information of a PDF.

A document or application is said to be accessible if it meets certain technical criteria and can be used by people with disabilities. It should be accessed by people who are mobility impaired, blind, low vision, or who have cognitive impairments. Accessibility features in Adobe Acrobat Pro make it easier for people with disabilities to use PDF documents and forms.

To make documents easier for screen readers to process correctly, documents should be structured with heading hierarchies; table headings, rows, and identified columns; and use lists when possible. For images in PDF, add alternative text which gives a quick description of an image. Images that do not require Alt Text are those that are purely decorative.

Challenges

We received a non-editable image pdf format and while doing text extraction using OCR technology, the text was not able to be retained exactly as per source pdf and it has a lot of errors in the text. There was an Autotag option to get tagged automatically by the tool.

But due to the poor quality of the PDF, the auto tag did not tag the PDF document properly. It also did not convert some special characters as per the source PDF. For the second instance, after completing the tagging, we upload the file for validation in the validator tool and after clearing some errors we found the PDF got corrupted. As the book volume was high around 500 pages.

Solution

Since Autotag was not working and if we type or check line by line, it would take more time to proofread the content. So, to avoid those, we followed some other ways to increase production.

Firstly, we used the PDF viewer tool to scan the documents so that it would recognize the text present in the PDF. But it won’t be the whole issue. We inserted the commonly repeating special characters in the language section of the OCR tool so that the special characters in the PDF get recognized and extracted from the source.

Following these steps, it increased the efficiency and the production rate. It also increased the accuracy level.

Results

Finally, our team members were able to complete that file, and got delivered on time with high accuracy level. Once completed, the client was happy with our efforts in delivering the book with good quality within the turnaround time frame.