Program Associate

dubakurcompany

3 - 9 Years
6 Openings
0.8 - 2.8 Lac/Yr
Face-to-Face interview
Chennai

Key Skills

Program Associate Program Officer Program Coordinator

Apply

Job Description

One of the pain points with regards to the first two types of PDF documents described (Text-based PDFs and Image-based PDFs) is that the information contained within the PDF itself is not organized.

This means that even if we are able to extract the text by programmatically reading the PDF lines, or by performing an OCR operation on the image embedded within the PDF that contains the text, we still need to make sense of that resultant extracted text.

All that text will be nothing more than words within lines or sentences if we are not able to give any meaning to it. Understanding how to find an invoice total amount within lines of text that contain multiple numbers is not an easy feat and such a process requires a certain level of algorithmic intelligence.

So, the first step to automate the data acquisition process is to c