Extracts data from an indicated web page. You can specify what information to extract by providing an XML string in the ExtractMetadata field, in the Properties panel. This can easily be generated with all the properties set by using the Data Scraping wizard.
- ExtractMetadata - An XML string that enables you to define what data to extract from the indicated web page.
- Target.Selector - Text property used to find a particular UI element when the activity is executed. It is actually a XML fragment specifying attributes of the GUI element you are looking for and of some of its parents.
- Target.TimeoutMS - Specifies the amount of time (in milliseconds) to wait for the activity to run before the
SelectorNotFoundExceptionerror is thrown. The default value is 30000 milliseconds (30 seconds).
- Target.WaitForReady - Before performing the actions, wait for the target to become ready. The following options are available:
- None - Does not wait for anything except the target UI element to exist before executing the action. For example, you can use this option if you want to retrieve just text from a web page or click a particular button, without having to wait for all UI elements to load. Note that this may have unwanted consequences if the button relies on elements which are not yet loaded, such as scripts.
- Interactive/Complete - Waits all of the UI elements in the target app to exist before actually executing the action.
To assess if an application is in the Interactive or Complete state, the following tags are verified:
- Desktop applications - A
wm_nullmessage is sent to check the existence of the
<uia>tags. If they exist, the activity is executed.
- Web applications:
a. Internet Explorer - The
<webctrl>tag is used to check if the Ready state of the HTML document is set to Complete. Additionally, the Busy state has to be set to "False".
b. Others - The
<webctrl>tag is used to check if the Ready state of the HTML document is Complete.
- SAP applications - First the presence of the
<wnd>tag verified, after which a SAP specific API is used to detect if the session is busy or not.
- Target.Element - Use the UiElement variable returned by another activity. This property cannot be used alongside the Selector property. This field supports only UiElement variables.
- Target.ClippingRegion - Defines the clipping rectangle, in pixels, relative to the UiElement, in the following directions: left, top, right, bottom. It supports both positive and negative numbers.
- DelayBetweenPagesMS - The amount of time, in milliseconds, to wait until the next page is loaded. (If the loading time of the page is longer, this value should be higher.)
- MaxNumberOfResults - The maximum number of results to be extracted. If the value is 0 all the identified elements are added to the output. The default value is 100.
- NextLinkSelector - The selector that identifies the link/button used to navigate to the next page. Should be relative to the ExistingUiElement property.
- SendWindowMessages - If selected, in the case where the data that is to be extracted spans multiple pages, the click that changes the page is executed by sending a specific message to the target application. This input method can work in the background, is compatible with most desktop apps, but it is not the fastest of the methods. By default, this check box is not selected. If neither this nor the SimulateClick check boxes are selected, the default method simulates the click by using the hardware driver. The default method is the slowest, it cannot work in the background, but it is compatible with all desktop apps.
- SimulateClick - If selected, in the case where the data that is to be extracted spans multiple pages, it simulates the click that changes the page by using the technology of the target application. This input method is the fastest of the three and works in the background. By default, this check box is not selected. If neither this nor the SendWindowMessages check boxes are selected, the default method performs the click using the hardware driver. The default method is the slowest, it cannot work in the background, but it is compatible with all desktop apps.
- DataTable - The information extracted from the indicated web page. This field supports only DataTable variables.
- DisplayName - The display name of the activity.
- ContinueOnError - Specifies if the automation should continue even when the activity throws an error. This field only supports Boolean values (True, False). The default value is False. As a result, if the field is blank and an error is thrown, the execution of the project stops. If the value is set to True, the execution of the project continues regardless of any error.
If this activity is included in Try Catch and the value of the ContinueOnError property is True, no error is caught when the project is executed.
- Private - If selected, the values of variables and arguments are no longer logged at Verbose level.
Here you can see how the Extract Structured Data activity is used in an example that incorporates multiple activities.
Updated 4 months ago