UiPath Activities

The UiPath Activities Guide

Read PDF Files

The example below explains how to read a .pdf file, extract the data using the specified process, and saves the output in a .txt file by using Read PDF Text or Read PDF with OCR activities. You can find these activities in the UiPath.PDF.Activities package.

This is how the automation process can be built:

  1. Open Studio and create a new Process.
  2. Drag a Flowchart container in the Workflow Designer.
    • Create the following variable:
Variable Name
Variable Type
Default Value

chooseOption

GenericValue

-

Note:

Add your .pdf files to the project directory in order to be able to run the entire process from the same place or download this example in order to use the given file.

  1. Drag an Input Dialog activity and connect it to the Start Node.
    • In the Properties panel, add the expression "Choose one option below:" in the Label field.
    • Add the expression {"Read PDF Text", "Read PDF With OCR"} in the Options field.
    • Add the value "Options" in the Title field.
    • Add the variable chooseOption in the Result field.
  2. Place a Flow Decision activity below the Input Dialog activity and connect it to it.
    • In the Properties panel, add the expression chooseOption = "Read PDF Text" in the Condition field.
  3. Drag a Sequence container and connect it to the True branch of the Flow Decision activity. The name of the Sequence should be Read PDF Text. This activity extracts information by using regular expressions.
    • Create the following variables:
Variable Name
Variable Type
Default Value

extractedText

String

-

arrayText

System.String[]

-

address

GenericValue

-

city

String

-

phoneNumber

String

-

invoiceNumber

String

-

vendor

GenericValue

-

bankName

String

-

bankAccount

String

-

ibanCode

String

-

  1. Open the Sequence container by double-clicking on it.
  2. Drag a Read PDF Text activity inside the sequence.
    • In the Properties panel, add the expression "NPO Invoice.pdf" in the FileName field.
    • Add the value "All" in the Range field.
    • Add the variable extractedText in the Text field.
  3. Place an Assign activity under the Read PDF Text activity.
    • Add the variable arrayText in the To field.
    • Add the expression extractedText.Split(Environment.NewLine.ToArray, StringSplitOptions.RemoveEmptyEntries) in the Value field.
  4. Drag an If activity below the Assign activity.
    • Add the expression arrayText(0).Equals("Tiefland Glass AG") in the Condition field.
  5. Place a Sequence container in the Then field.
  6. Drag an Assign activity inside the Sequence container.
    • Add the variable address in the To field.
    • Add the expression arrayText(2) in the Value field.
  7. Drag another Assign activity and place it below the previous one.
    • Add the variable city in the To field.
    • Add the expression arrayText(3).Split(","c)(0) in the Value field.
  8. Drag another Assign activity and place it below the previous one.
    • Add the variable phoneNumber in the To field.
    • Add the expression arrayText(4).Split(":"c)(1).Split({"INVOICE"},StringSplitOptions.None)(0) in the Value field.
  9. Drag another Assign activity and place it below the previous one.
    • Add the variable invoiceNumber in the To field.
    • Add the expression arrayText(4).Split(":"c)(1).Split({"INVOICE"},StringSplitOptions.None)(1).Split("#"c)(1) in the Value field.
  10. Drag another Assign activity and place it below the previous one.
    • Add the variable vendor in the To field.
    • Add the expression arrayText(arrayText.Count-5) in the Value field.
  11. Place a Sequence container in the Else field.
  12. Drag an Assign activity inside the Sequence container.
    • Add the variable address in the To field.
    • Add the expression arrayText(1) in the Value field.
  13. Drag another Assign activity and place it below the previous one.
    • Add the variable city in the To field.
    • Add the expression arrayText(2).Split(","c)(0) in the Value field.
  14. Drag another Assign activity and place it below the previous one.
    • Add the variable phoneNumber in the To field.
    • Add the expression arrayText(3).Split(":"c)(1).Split({"INVOICE"},StringSplitOptions.None)(0) in the Value field.
  15. Drag another Assign activity and place it below the previous one.
    • Add the variable invoiceNumber in the To field.
    • Add the expression arrayText(3).Split(":"c)(1).Split({"INVOICE"},StringSplitOptions.None)(1).Split("#"c)(1) in the Value field.
  16. Drag another Assign activity and place it below the previous one.
    • Add the variable vendor in the To field.
    • Add the expression arrayText(arrayText.Count-5) in the Value field.
  17. Place a For Each activity below the If container.
  18. Double-click the For Each activity for opening it.
    • Add the variable arrayText in the Value field.
  19. Drag an If activity inside the Body container of the For Each activity.
    • Add the expression item.Contains("Bank Name:") in the Condition field.
  20. Drag an Assign activity inside the Then field.
    • Add the variable bankName in the To field.
    • Add the expression item.Split(":"c)(1) in the Value field.
  21. Place an If activity below the previous one.
    • Add the expression item.Contains("Bank Account:") in the Condition field.
  22. Drag an Assign activity inside the Then field.
    • Add the variable bankName in the To field.
    • Add the expression item.Split(":"c)(1) in the Value field.
  23. Place an If activity below the previous one.
    • Add the expression item.contains("IBAN Code:") in the Condition field.
  24. Drag an Assign activity inside the Then field.
    • Add the variable ibanCode in the To field.
    • Add the expression item.Split(":"c)(1) in the Value field.
  25. Place a Write Text File activity and place it below the For Each activity.
    • In the Properties panel, add the value"InvoiceDetails.txt" in the FileName field.
    • Add the expression "Invoice details"+Environment.NewLine+Environment.NewLine+"Vendor: "+vendor+Environment.NewLine+"Vendor address: "+address+Environment.NewLine+"City: "+city+Environment.NewLine+"Phone number:"+phoneNumber+Environment.NewLine+"Invoice number:"+invoiceNumber+Environment.NewLine+"Bank name:"+bankName+Environment.NewLine+"Bank account:"+bankAccount+Environment.NewLine+"IBAN Code:"+ibanCode in the Text field.
  26. Return to the Main workflow.
  27. Drag a Sequence container and connect it to the False branch of the Flow Decision activity. The name of the Sequence should be Read PDF With OCR. This activity extracts information by using an OCR engine (Microsoft OCR and Tesseract OCR).
    • Create the following variable:
Variable Name
Variable Type
Default Value

extractedTextTesseract

String

-

extractedTextMicrosoft

String

-

  1. Open the Sequence container by double-clicking on it.
  2. Drag a Read PDF With OCR activity inside the sequence.
    • In the Properties panel, add the value "Invoice02.pdf" in the FileName field.
    • Add the value 1 in the DegreeOfParallelism field.
    • Select the value 150 from the ImageDpi drop-down list.
    • Add the value "All" in the Range field.
  3. Drag the Tesseract OCR engine inside the Read PDF With OCR activity.
    • In the Properties panel, add the value Image in the Image field.
    • Add the value "eng" in the Language field.
    • Select the None option from the Profile drop-down list.
    • Add the value 2 in the Scale field.
    • Add the variable extractedTextTesseract in the Text field.
  4. Drag another Read PDF With OCR activity inside the sequence and place it below the previous one.
    • In the Properties panel, add the value "Invoice02.pdf" in the FileName field.
    • Add the value 1 in the DegreeOfParallelism field.
    • Select the value 150 from the ImageDpi drop-down list.
    • Add the value "All" in the Range field.
  5. Drag the Microsoft OCR engine inside the Read PDF With OCR activity.
    • In the Properties panel, add the value Image in the Image field.
    • Add the value "en" in the Language field.
    • Select the None option from the Profile drop-down list.
    • Add the value 1 in the Scale field.
    • Add the variable extractedTextMicrosoft in the Text field.
  6. Drag a Write Text File activity below the Read PDF With OCR activity.
    • In the Properties panel, add the value "OCRMicrosoft.txt" in the FileName field.
    • Add the variable extractedTextMicrosoft in the Text field.
  7. Drag a Write Text File activity below the previous Write Text File activity.
    • In the Properties panel, add the value "OCRTesseract.txt" in the FileName field.
    • Add the variable extractedTextTesseract in the Text field.
  8. Run the process. The robot extracts the data using the specified process and saves the output in a .txt file.
     
     
    Download example

Updated about a month ago


Read PDF Files


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.