Amazon Comprehend

3 min readJan 9, 2024

Amazon Comprehend uses natural language processing (NLP) to extract insights (gaining accurate and deep understanding) about the content of documents.

It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Each insight has a confidence score that ranges from 0 to 100 to ensure that the prediction is correct.

For example, Amazon Comprehend can search social networking feeds for mentions of products or scan an entire document repository for key phrases. It is like customer reviews about the product.

1. Terminologies in insights:

Amazon is one of the biggest cloud service providers in America. It was established in 1996 in Seattle 98109. It launches the new services every 2 months. Have a great cloud!

Entities a unique names of real-world objects such as people, places, dates, quantities, etc. (Amazon, America, Seattle, 1996, 2)

A key phrase is a string containing a noun phrase that describes a particular thing. (the biggest cloud service providers)

Events specific events and their related details

Language determines the language in a given document. (English)

Personally Identifiable Information which collects personal information like address, phone number, etc. (Seattle,98109)

Syntax determines the part of the speech.

Sentiment determines the sentiment of the document.

You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or Amazon Comprehend APIs.

You can use Amazon Comprehend to read your data from Amazon S3. You can write the results from Amazon Comprehend to a storage service, database, or data warehouse. It supports VPC.

2. Document Processing in Amazon Comprehend:

Amazon Comprehend provides synchronous and asynchronous document processing modes. Use synchronous mode for processing one document or a batch of up to 25 documents. Use an asynchronous job to process a large number of documents. Amazon Comprehends supports plain text documents, semi-structured documents like PDF documents, Word documents, scanned PDF files, and Image files in JPG, PNG, and TIFF format. You can see the document file size in this URL.

There are two levels of errors in semi-structured documents. They are page-level errors such as page size, too many characters on the page, problems in reading the page, and Internal server errors. Document-level errors such as doc. type, doc. size, too many pages, and permission-based errors.

3. Topic Modelling

Amazon comprehends uses a Latent Dirichlet allocation-based learning model to determine the topics in a set of documents that gives more accuracy.

Classification is to segregate the information in your interest. For example, if the document is about sports. You can categorize it into games, functions, or news.

Custom entity recognition is to segregate the information by using your own trained model.

4. Encryption in Amazon Comprehend

Amazon S3 already encrypts your documents. It is integrated with KMS encryption to provide additional protection.

Amazon Textract is an AI-based service offered by Amazon. click below to see

Solve the OCR issues using Amazon Textract

Amazon Textract is an ML service that uses OCR (Optical Character Recognition) to detect and extract printed text…

abiabi0707.medium.com

Amazon Comprehend

Solve the OCR issues using Amazon Textract

Amazon Textract is an ML service that uses OCR (Optical Character Recognition) to detect and extract printed text…

Written by Abimuktheeswaran Chidambaram