Skip to main content

Retrieve text from a PDF

Extracts specific content from a PDF document and returns it as text.

Common use cases

  • Data Manipulation WrkFlows

Application

  • PDF

Inputs (what you have)

NAMEDESCRIPTIONTYPEREQUIREDEXAMPLE
File nameThe name of the documentTextYesFile-1
Type of object(s) to extractThe type of object(s) to extractPagesParagraphsTable cellBulleted listNumbered listSentencesWordsLinesPredefinedChoice ListYesPage 1
Number of object(s) to extractSpecify the number of objects to be extractedIntegerYes1 cell, 2 rows
Search location keywordThe keyword to be used to locate the content to be extracted.TextNoData
Page numberThe document page where the content to be extracted is locatedIntegerNo1
Section heading nameThe section where the content to be extracted is locatedTextNoHeading 1
Paragraph numberThe paragraph where the content to be extracted is locatedIntegerNo5
Line numberThe paragraph line where the content to be extracted is locatedIntegerNo3
Cell rowRow to extract, if object is a tableIntegerNo5
Cell columnColumn to extract, if the object is a tableIntegerNo2

Note: The value of inputs can either be a set value in the configuration of the Wrk Action within the Wrkflow, or a variable from the Data Library. These variables in the Data Library are the outputs of previous Wrk Actions in the Wrkflow.

How it works

The extraction will always begin at the start of the object being extracted using the first instance of the keyword in relation to the page number, paragraph number or section heading.

Please note:

  • A paragraph is one or more sentences beginning on a new line.
Text to extractOptional inputs to configure
PagesOnly page number
ParagraphsProvide at least two of the following:Page number, paragraph number, section heading, search keyword
Table cellsRow or column or both
Bulleted listsProvide a search keyword with a page number or section heading
SentencesProvide a search keyword, and at least one of the following:page number, paragraph number, line number
LinesProvide a search keyword and any or all optional inputs except cell row and cell column
WordsProvide a search keyword and any or all optional inputs except cell row and cell column

Outputs (what you get)

NAMEDESCRIPTIONTYPEREQUIREDEXAMPLE
Extracted textText retrieved from the PDF documentTextYeswrk technologies

Outcomes

NAMEDESCRIPTION
SuccessThis status is selected when the text was successfully retrieved from a PDF.
No ResultThis status is selected in the event of the following scenarios:- Information cannot be found in the PDF document
UnsuccessfulThis status is selected in the event of the following scenarios:- The file cannot be opened- The file is not a PDF document

Requirements

  • N/A