UiPath Activities

The UiPath Activities Guide

RegEx Based Extractor

UiPath.IntelligentOCR.Activities.DataExtraction.RegexBasedExtractor

Enables you to build, test, and extract Regular Expression search criteria.

Note:

All -Preview package versions have their documentation available here. All changes are listed by the -Preview version and activity type.

Properties

Common

  • DisplayName - The display name of the activity.

Input

  • Configuration - Specifies the configuration value for the extractor as a JSON escaped string. The configuration can be generated by using the extractor wizard. You can keep the configuration in the Properties panel, as a string or you can define it by using the wizard and bind it to a variable. It is advisable to edit the Configuration field by using the wizard and not the Properties panel.
  • Timeout - Specifies the timeout value for any Regex search, in milliseconds. A timeout of 0, or negative, is interpreted as infinite. The default value is 2000.

Misc

  • Private - If selected, the values of variables and arguments are no longer logged at Verbose level.

Note:

This activity cannot work with table, set or boolean fields.

Using the RegEx Based Extractor Wizard

  1. Add a RegEx Based Extractor activity to your workflow.
  2. Configure your regular expressions by clicking on the Configure Expressions button.
    • The Wizard window opens.
  1. Expand the Document Type from the first row in order to see the available fields and to be able to customize your regular expression and its options.
  1. Add your Regular Expression in the Expression field.

Note:

You have the option of either writing the whole RegEx in the Expression field or to build it by using the Edit button.

  1. Click on the drop-down list from the Regex Options column.
  1. There are multiple options you can choose from:
    • CultureInvariant - Specifies that the linguistic cultural differences are ignored.
    • ECMAScript - Enables ECMA Script compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase and Multiline options.
    • ExplicitCapture - Specifies that the only valid captures are the ones of groups that are explicitly named or numbered and are defined as (?<name> subexpression). Any unnamed parentheses are to be ignored.
    • IgnoreCase - Specifies that the search is not case sensitive.
    • IgnorePatternWhitespace - Eliminates the unescaped white space from the defined pattern and enables the comments marked with #. This option does not apply to character classes, numeric quantifiers, or tokens marking the beginning of an individual RegEx language element.
    • Singleline - Specifies that the search is initiated in a single line. The dot (.) matches all characters, including the exception \n.
    • Multiline - Specifies that the search is initiated in multiple lines. For this option, the special characters ^ and $ match the beginning and the ending of any line.
    • RightToLeft - Specifies that the search is performed from right to left.

Note:

More information about the Regular Expression Options can be found here.

  1. Click on the Edit button for editing the options of that field and the format of the regular expression.
  1. Add text in the Test Text field for testing the search criteria you choose against the text that you want to apply RegEx on.
  1. Select one of the RegEx formula types from the drop-down list. This sets the RegEx expression to match one of the following characteristics:
    • Literal - Matches the exact characters specified by you. This option is case sensitive.
    • Digit - Matches a digit.
    • One of - Matches a single character present in the set.
    • Not one of - Matches a single character not present in the set.
    • Anything - Matches any character, except for \n.
    • Any word character - Matches any letters and numbers.
    • Whitespace - Matches one white space.
    • Starts with - Initiates the search where the line starts.
    • Ends with - Initiates the search where the line ends.
    • Advanced - Requires a custom expression.
    • Email - Matches an email address.
    • URL - Matches an URL.
    • US date - Matches the US date format.
    • US phone number - Matches the US phone number format.

Note:

More information about the Regular Expression Quantifiers can be found here.

  1. Use the Value field for writing the value of the regular expression.
  2. Select a quantifier from the Quantifiers drop-down list.
    • Exactly - Matches the preceding element exactly how many times it is specified. By default, it is set to 1.
    • Any (0 or more) - Matches the preceding element for zero or more times, but as few times as possible.
    • At least one (1 or more) - Matches the preceding element for one or more times.
    • Zero or one - Matches the preceding element for zero or one time but for as few times as possible.
    • Between x and y times - Matches the preceding element between x and y times, where x and y are integers, but as few times as possible.
  3. Use the plus button for adding an extra RegEx field. Move fields up and down in the hierarchy by using the up and down buttons. Use the delete button for deleting the field.
  4. Select the check box for the Capture option if you want to extract that specific field.
  5. The Full Expression field shows the entire expression exactly how it was customized by you.
  6. Select one or multiple options from the Regex Options drop-down list.
  1. Click the Save button once all your configurations are done to exit the Edit mode and then click the Save once again for closing the wizard.

Updated about a month ago


RegEx Based Extractor


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.