- Working With Dashboards and Charts
- Working With Process Graphs
- Showing or Hiding the Menu
- Context Information
- Root Cause Analysis
- Sending Automation Ideas to UiPath Automation Hub
- Filters
- Simulating Automation Potential
- Tags
- Due Dates
- Compare
- Export
- Triggering an automation from a process app
- Starting a Task Mining project from Process Mining
Loading Data Using CData Sync
Introduction
CData Sync is a tool that is used to extract data from source systems into Process Mining. The supported source systems can be found on the Sources page on the CData Sync website. Refer to the official CData Sync website for more information on CData Sync.
This page describes how to use CData Sync to load data from your source system into a process app in Process Mining.
Prerequisites
It is assumed that you have:
- a valid CData Sync license.
Note: The license is provided via UiPath for the purpose of Process Mining.
- installed CData Sync. See Installing CData Sync
Pseudonymizing Data
Using CData Sync, data fields can be pseudonymized during extraction. This way, sensitive data fields can be hashed at the source before it reaches the Process Mining product. Follow these steps to pseudonymize data.
- Go to the desired task of a job in CData.
- Edit the query.
-
Add the
hashbytes()
function to each field that needs to be pseudonymized. For exampleREPLICATE [SOURCE_TABLE] SELECT hashbytes('SHA2_256',[SOURCE_FIELD]) AS [DESTINATION_FIELD] FROM [SOURCE_TABLE];
This way, all values will be hashed with a SHA2 256 hash. For more information, see SHA-2. If desired, the hashed values can be transformed into values like "Person 1", "Person 2", "Person 3" by writing SQL queries for this in the data transformations.
Loading Data using CData Sync
Setting up data load using CData Sync requires several steps to be performed.
- Create a source connection
- Create a destination connection
- Create a job
- Run the job
The steps are described in detail below.
Create a Source Connection
Follow these steps to set up the source connection.
Step |
Action |
---|---|
1 | In the CData Sync Admin console, create a new Source connection for your source system.
Note: If your source system is not in the list you can click on + Add More to display a list of all available source CData Sync Connectors. Select the connector for your source system and click on
Download & Install.
|
2 | Enter a descriptive name for the source connection in the Name field. |
3 | Enter the required properties to set up a connection with your source system. |
4 | Create and test the connection. |
Setting up a source connection for .csv or .tsv files
If you want to set up a source connection to load data from .csv
or .tsv
files make sure to:
- Select CSV as the source system to which you want to create a connection from the list.
- Set the URI to the path where the
.csv
or.tsv
files are stored. - Set the correct ConnectionType. Use Local when the files are stored on the same server as CData Sync.
- Set FMT to the appropriate value used in the data files.
Section | Parameter | Value |
---|---|---|
Other | Exclude File Extensions | True |
Other | Include Files | Add ,TSV to the setting if you want to upload .tsv files
|
Schema | Type Detection Scheme | None |
Data Formatting | Push Empty Values As Null | True |
Create a Destination Connection
Setup credentials for AzureBlob
To set up an AzureBlob destination connection, you need the following setup credentials for the AzureBlob.
container
azure shared access signature (SAS uri)
account
These credentials are located in the url of the Azure blob storage to which the extracted data needs to be uploaded.
See also Retrieving the Credentials for the Azure Blob Storage.
Determine the setup parameters from the upload url as described below.
Example
The table below displays the parameters retrieved from the example download uri:
https://pmdoctestcd.blob.core.windows.net/c7045b18-be6f-4534-2fba-0b1f42c1e6d3?sv=2020-06-10&si=sap-c7044b18-be8f-4534-8fba-0b2f42c1e7c7&sr=c&sig=cjknPMhIeHvaKtpMgNFmUVp7KHQirhR1m1WCMkCiZUg%3D
Parameter | Description | Example |
---|---|---|
azure access signature |
everything after the question mark | sv=2020-06-10&si=sap-c7044b18-be8f-4534-8fba-0b2f42c1e7c7&sr=c&sig=cjknPMhIeHvaKtpMgNFmUVp7KHQirhR1m1WCMkCiZUg%3D |
account |
the first part of the url | pmdoctestcd |
container |
the app id, or the first guid in the url | c7045b18-be6f-4534-2fba-0b1f42c1e6d3 |
Setup the AzureBlob destination connection
Follow these steps to create the AzureBlob destination connection.
Step | Action |
---|---|
1 | Define a new connection of type AzureBlob. |
2 | Check if CData AzureBlobDestination Provider CData AzureBlobDestination Provider is 22.0.8348.0 or higher. See also Prerequisites. |
3 | Enter a descriptive name for the destination connection. For example, AzureBlob_IM. |
4 | Enter the AzureBlob credential parameters retrieved from the upload url.
|
5 | Go to the Advanced tab and locate the Other section.
|
6 | Create and test the connection. |
See the illustration below for an example.
Create a Job
The input data must meet the format as required for the app template you are using to create your process app. See App Templates.
Follow these steps to create the extraction job.
- Create a new CData Sync job.
- Enter a descriptive name for the job in the Job Name field. For example, ServiceNow_to_AzureBlob or ServiceNow_to_SQLServer.
- Select the source connection created in Create a source connection the source connection from the Source list.
- Select the destination connection created in Create a destination connection from the Destination list.
- Make sure the option Standard is selected as the Replication Type and click on +Create.
- Create a Custom Query using the replication queries for your app template.
Tip:
You can copy the replication queries from the documentation for you app template that can be accessed from App Templates.
- Go to the Advanced tab in the Job Settings panel.
- Select the Drop Table option to prevent the data to be appended to the table.
- Enable the checkbox
Enable Parallel Processing
and enter8
in the Worker Pool field to improve loading speed. - Make sure the Replicate Interval and Replicate Interval Unit are set so the resulting period is equal or larger than the extraction period. For example, if you want to extract data for a period of a year, the Replicate Interval and Replicate Interval Unit must reflect at least a year period.
- Save your changes.
Call the Data Ingestion API
Follow these steps to edit the post-job event to call the data ingestion API.
- Go to the Events tab in the Job Settings panel.
- Edit the Post-Job Event section and add the ingestion API call from the code block below ensuring to replace the
value
of the "http.url" with the End of upload API. See Retrieving the Credentials for the Azure Blob Storage.<api:set attr="http.url" value="https://my-uipath-server.com/default/defaulttenant/processMining_/api/v4.0/apps/98dfd1d5-9e42-4f0e-9a0a-8945629f01b3/transform/unauthenticated"/> <api:call op="httpPost" in="http"/>
<api:set attr="http.url" value="https://my-uipath-server.com/default/defaulttenant/processMining_/api/v4.0/apps/98dfd1d5-9e42-4f0e-9a0a-8945629f01b3/transform/unauthenticated"/> <api:call op="httpPost" in="http"/> - Save your changes.
- Click on JOBS in the menu bar and locate the destination job.
- Click on the Run All Queries icon to check if the job runs correctly.
Logging information
You can add additional statements to obtain logging information on the API call. Follow these steps.
- Add the following statements between the lines previously added in the Post-Job event.
<api:set attr="http.verbosity" value="5"/>
<api:set attr="http.logfile" value="D:/mydir/cdata_log.txt"/>
See the illustration below.
- Go to the Logging & History tab in the Job Settings panel.
- Select Verbose from the Logfile Verbosity list box.
- Run the job and check the log file that is created.
Run the Job
Follow these steps to run the extraction job.
- Click on JOBS in the menu bar and locate the extraction job created.
-
Click on the Run all queries icon. See the illustration below.
- Wait until the job has finished. Depending on the amount of data, this can take several minutes.
-
Once the extraction job has completed, the status "Transforming data" is displayed in the app overview the Process Mining Portal.Note:
You can monitor the progress in the log. See Viewing Logs.
Scheduling jobs
If you want to run the extraction job on a regular interval, you can use the CData Sync Scheduler to define a schedule.
Follow these steps to schedule an extraction job.
Step | Action |
---|---|
1 | Open the CData Sync extraction job created in Creating a job. |
2 | Go to the Schedule tab in the Job Settings panel. |
Refer to the official CData Sync documentation for more information on how to schedule jobs.
Loading Data From Multiple Source Systems
Using CData Sync, it is possible to load data from multiple different source systems into a single process app. To do that, multiple extraction jobs have to be created, each having a corresponding source connection. Each extraction job has to call the next one in its post-job events, such that all jobs are executed one-by-one. The final job has to do the API call after it has finished. See the illustration below for an overview.
Linking jobs
In CData Sync, to make sure a job calls the next job, go to the Events tab in the Job Settings panel and edit the Post-Job Event section of the first job to add the code displayed below.
<!-- Start Executing different Job -->
<api:set attr="job.JobName" value="Extraction_job_2"/>
<api:set attr="job.ExecutionType" value="Run"/>
<api:set attr="job.WaitForResults" value="true"/>
<api:call op="syncExecuteJob" in="job"/>
<!-- Start Executing different Job -->
<api:set attr="job.JobName" value="Extraction_job_2"/>
<api:set attr="job.ExecutionType" value="Run"/>
<api:set attr="job.WaitForResults" value="true"/>
<api:call op="syncExecuteJob" in="job"/>
See the illustration below.