Process Mining
latest
false
Extractors - Automation Cloud latest
logo
Process Mining
Last updated Nov 28, 2023

Extractors

Introduction

There are different methods that can be used for extracting data from the source system.

Each extraction method must be described in a separate subdirectory in the extractors folder and must contain an instructions.md file with the instructions about:

  • Installation: how to set up the extractor.
  • Configuration: how to configure the extractor to get the desired data.
  • Execution: how to run the data extraction.

See ProcessMining-devkit-connector/extractors as an example.

The preferred data extraction tool is CData Sync, see the official CData website. If CData Sync is not suitable for extracting data from your source system, an alternative tool can be used.

Load-from-file

The load-from-file extraction method is available in all connectors. This extraction method loads data from .csv files (which in general are exported from a source system) and stores it in a SQL Server database with the use of CData Sync.

The instructions.md must contain a custom query to load the tables and fields for this connector.

The Connector & App framework has created a default version of the load-from-file extraction method. It is recommended to use this extraction method as a basis for each connector. See Loading Data Using CData Sync for more information on how to use the load-from-file extractor.

Load-from-source

Additionally, a load-from-source extractor must be available. This extractor loads data from the source system directly and stores the data in a SQL Server database.

CData Sync

If CData Sync can connect to the source system directly, then instructions.md must contain instructions on how to set up the CData Sync job. It should state which source system is used, e.g. “Salesforce”, and it should contain the Custom query to load the required tables and fields.

Other Extraction Methods

If CData Sync cannot be used to connect to the desired source system, then a custom extraction method has to be provided. Typically, a script should be created to extract the data and get it into a SQL Server database. It should be possible to schedule the data extraction automatically on regular intervals. See Scripts.

Data Minimalization

To prevent unnecessary data loads on the source system and minimize throughput time, the extractor should apply data minimalization where possible. This means that no unnecessary data should be retrieved. No fields or tables should be loaded that are not used in the transformation.

Additionally, it is expected that instructions or configuration options to filter the data retrieved are available. For example, by setting a date range on the extraction.

Guidelines for Extractors

For each extraction method, make sure that:

  • Tables and fields that are not needed for the transformations are not extracted.
  • Data is as minimized as possible during extraction. This can be achieved by filtering the input data.
  • There is at least a configuration instruction to limit the extracted date range.
  • The extractor only extracts data, and does not perform any transformations.
  • The configuration instructions do not contain credentials or other secrets.
  • The configuration instructions describe how to reduce data size.
  • All files created to support an extraction method should be put in the folder describing the extraction method.
logo
Get The Help You Need
logo
Learning RPA - Automation Courses
logo
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2023 UiPath. All rights reserved.