Solve the OCR issues using Amazon Textract

Abimuktheeswaran Chidambaram
3 min readJan 13, 2024

--

Amazon Textract is an ML service that uses OCR (Optical Character Recognition) to detect and extract printed text, handwriting, structured data, and tables from images and scanned documents. Amazon Textract currently supports PNG, JPEG, TIFF, and PDF formats. The document size is a maximum of 5MB and fewer than 11 pages.

Amazon Textract determines and analyzes the text that is in synchronous (single-page document) and asynchronous (Multi-page documents) operations.

For single-page upload, you can upload the documents from your local drive. It supports up to 15 queries per page. You can download the result in CSV format.

single-page upload
bulk upload

For bulk upload, you can upload up to 150 documents per request from your S3 bucket (or) local drive. It supports up to 30 queries per page. You can download the result in CSV format.

1. Operations performed by Amazon Textract using API

Amazon Textract determines and analyzes the text that is in synchronous (single-page document) and asynchronous (Multi-page documents) operations.

•Detecting Text: It determines the detected text of lines and words, their relationship, their location, and page details using DetectDocumentText API.

•Analyzing Text analyses the relationship among the detected text. The relationships are text, forms, tables, query responses, and signatures. we can extract the text from the detected text on a line or word basis using Document Analysis API.

It extracts the text from the fields such as header, footer, table, image, etc from the page.

we can query in the document using Analyze Document API. The sample document was made by Amazon, so it does not require an adapter. we want to create an adapter when we increase the accuracy of our document and also for custom queries.

•Analyzing Invoices and Receipts extracts information such as vendor and receiver information, and bill information using AnalyzeExpense API. It also extracts the name within the logo.

•Analyzing Identity Documents extracts the information from the identity documents issued by the government such as passports, and driver's licenses using AnalyzeID API.

Analyze Lending is extracting classifying, and validating the documents using document processing API from your local drive. It has a maximum of 5 MB with 10 pages

2. Features and use cases:

So, the features of Textract are the elimination of manual work in data entry and scanning, saving time, and a high accuracy rate in extracting data.

Use cases of Amazon Textract are invoice processing, Digitalization, and form processing.

Amazon Comprehend is an AI-based service offered by Amazon. click below the link to learn more

--

--