Process Mining
latest
false
Loading Data Using CData Sync - Automation Cloud latest
logo
Process Mining
Last updated Nov 28, 2023

Loading Data Using CData Sync

Note: The information in this page is based on CData Sync 2022 - 22.0.8342.0. If you use an other version of CData Sync, field names or functions may be different.

Introduction

CData Sync is a tool that is used to extract data from source systems into Process Mining. The supported source systems can be found on the Sources page on the CData Sync website. Refer to the official CData Sync website for more information on CData Sync.

This page describes how to use CData Sync to load data from your source system into a process app in Process Mining.



Prerequisites

It is assumed that you have:

  • a valid CData Sync license.
    Note: The license is provided via UiPath for the purpose of Process Mining.
  • installed CData Sync. See Installing CData Sync

Pseudonymizing Data

Using CData Sync, data fields can be pseudonymized during extraction. This way, sensitive data fields can be hashed at the source before it reaches the Process Mining product. Follow these steps to pseudonymize data.

  1. Go to the desired task of a job in CData.
  2. Edit the query.
  3. Add the hashbytes() function to each field that needs to be pseudonymized. For example

    REPLICATE [SOURCE_TABLE] SELECT hashbytes('SHA2_256',[SOURCE_FIELD]) AS [DESTINATION_FIELD] FROM [SOURCE_TABLE];

    This way, all values will be hashed with a SHA2 256 hash. For more information, see SHA-2. If desired, the hashed values can be transformed into values like "Person 1", "Person 2", "Person 3" by writing SQL queries for this in the data transformations.

Loading Data using CData Sync

Setting up data load using CData Sync requires several steps to be performed.

  1. Create a source connection
  2. Create a destination connection
  3. Create a job
  4. Run the job

The steps are described in detail below.

Create a Source Connection

Note: Refer to the Configuring CData Sync page for your app template for specific settings for setting up the source connection.
Note: Check the CData Sync documentation for any specific settings or requirements for your source system.

Follow these steps to set up the source connection.

Step

Action

1 In the CData Sync Admin console, create a new Source connection for your source system.
Note: If your source system is not in the list you can click on + Add More to display a list of all available source CData Sync Connectors. Select the connector for your source system and click on Download & Install.
2 Enter a descriptive name for the source connection in the Name field.
3 Enter the required properties to set up a connection with your source system.
4 Create and test the connection.

Setting up a source connection for .csv or .tsv files

If you want to set up a source connection to load data from .csv or .tsv files make sure to:

  • Select CSV as the source system to which you want to create a connection from the list.
  • Set the URI to the path where the .csv or .tsv files are stored.
  • Set the correct ConnectionType. Use Local when the files are stored on the same server as CData Sync.
  • Set FMT to the appropriate value used in the data files.
Define the following settings in the Advanced tab in the Connection Settings panel to make sure the data is properly loaded into Process Mining.
Section Parameter Value
Other Exclude File Extensions True
Other Include Files Add ,TSV to the setting if you want to upload .tsv files
Schema Type Detection Scheme None
Data Formatting Push Empty Values As Null True

Create a Destination Connection

Setup credentials for AzureBlob

To set up an AzureBlob destination connection, you need the following setup credentials for the AzureBlob.

Determine the setup parameters from the upload url as described below.

Example

The table below displays the parameters retrieved from the example download uri:

https://pmdoctestcd.blob.core.windows.net/c7045b18-be6f-4534-2fba-0b1f42c1e6d3?sv=2020-06-10&si=sap-c7044b18-be8f-4534-8fba-0b2f42c1e7c7&sr=c&sig=cjknPMhIeHvaKtpMgNFmUVp7KHQirhR1m1WCMkCiZUg%3D

Parameter Description Example
azure access signature everything after the question mark sv=2020-06-10&si=sap-c7044b18-be8f-4534-8fba-0b2f42c1e7c7&sr=c&sig=cjknPMhIeHvaKtpMgNFmUVp7KHQirhR1m1WCMkCiZUg%3D
account the first part of the url pmdoctestcd
container the app id, or the first guid in the url c7045b18-be6f-4534-2fba-0b1f42c1e6d3

Setup the AzureBlob destination connection

Follow these steps to create the AzureBlob destination connection.

Step Action
1 Define a new connection of type AzureBlob.
2 Check if CData AzureBlobDestination Provider CData AzureBlobDestination Provider is 22.0.8348.0 or higher. See also Prerequisites.
3 Enter a descriptive name for the destination connection. For example, AzureBlob_IM.
4 Enter the AzureBlob credential parameters retrieved from the upload url.
  • Select SAS as the Auth Scheme.
  • Enter the Azure Shared Access Signature (SAS uri).
5 Go to the Advanced tab and locate the Other section.
  • Set Insert Mode to SingleFile.
  • Set Include Column Headers to True.
6 Create and test the connection.

See the illustration below for an example.



Create a Job

Important:

The input data must meet the format as required for the app template you are using to create your process app. See App Templates.

Follow these steps to create the extraction job.

  1. Create a new CData Sync job.
  2. Enter a descriptive name for the job in the Job Name field. For example, ServiceNow_to_AzureBlob or ServiceNow_to_SQLServer.
  3. Select the source connection created in Create a source connection the source connection from the Source list.
  4. Select the destination connection created in Create a destination connection from the Destination list.
  5. Make sure the option Standard is selected as the Replication Type and click on +Create.
  6. Create a Custom Query using the replication queries for your app template.
    Tip:

    You can copy the replication queries from the documentation for you app template that can be accessed from App Templates.

  7. Go to the Advanced tab in the Job Settings panel.
    • Select the Drop Table option to prevent the data to be appended to the table.
    • Enable the checkbox Enable Parallel Processing and enter 8 in the Worker Pool field to improve loading speed.
    • Make sure the Replicate Interval and Replicate Interval Unit are set so the resulting period is equal or larger than the extraction period. For example, if you want to extract data for a period of a year, the Replicate Interval and Replicate Interval Unit must reflect at least a year period.
  8. Save your changes.

Call the Data Ingestion API

Follow these steps to edit the post-job event to call the data ingestion API.

  1. Go to the Events tab in the Job Settings panel.
  2. Edit the Post-Job Event section and add the ingestion API call from the code block below ensuring to replace the value of the "http.url" with the End of upload API. See Retrieving the Credentials for the Azure Blob Storage.

    <api:set attr="http.url" value="https://my-uipath-server.com/default/defaulttenant/processMining_/api/v4.0/apps/98dfd1d5-9e42-4f0e-9a0a-8945629f01b3/transform/unauthenticated"/> 
    <api:call op="httpPost" in="http"/><api:set attr="http.url" value="https://my-uipath-server.com/default/defaulttenant/processMining_/api/v4.0/apps/98dfd1d5-9e42-4f0e-9a0a-8945629f01b3/transform/unauthenticated"/> 
    <api:call op="httpPost" in="http"/>


  3. Save your changes.
  4. Click on JOBS in the menu bar and locate the destination job.
  5. Click on the Run All Queries icon to check if the job runs correctly.

Logging information

You can add additional statements to obtain logging information on the API call. Follow these steps.

  1. Add the following statements between the lines previously added in the Post-Job event.

    <api:set attr="http.verbosity" value="5"/>

    <api:set attr="http.logfile" value="D:/mydir/cdata_log.txt"/>

    See the illustration below.



  2. Go to the Logging & History tab in the Job Settings panel.
  3. Select Verbose from the Logfile Verbosity list box.
  4. Run the job and check the log file that is created.

Run the Job

Follow these steps to run the extraction job.

  1. Click on JOBS in the menu bar and locate the extraction job created.
  2. Click on the Run all queries icon. See the illustration below.



  3. Wait until the job has finished. Depending on the amount of data, this can take several minutes.
  4. Once the extraction job has completed, the status "Transforming data" is displayed in the app overview the Process Mining Portal.
    Note:

    You can monitor the progress in the log. See Viewing Logs.

Scheduling jobs

If you want to run the extraction job on a regular interval, you can use the CData Sync Scheduler to define a schedule.

Follow these steps to schedule an extraction job.

Step Action
1 Open the CData Sync extraction job created in Creating a job.
2 Go to the Schedule tab in the Job Settings panel.


Refer to the official CData Sync documentation for more information on how to schedule jobs.

Loading Data From Multiple Source Systems

Using CData Sync, it is possible to load data from multiple different source systems into a single process app. To do that, multiple extraction jobs have to be created, each having a corresponding source connection. Each extraction job has to call the next one in its post-job events, such that all jobs are executed one-by-one. The final job has to do the API call after it has finished. See the illustration below for an overview.



Linking jobs

In CData Sync, to make sure a job calls the next job, go to the Events tab in the Job Settings panel and edit the Post-Job Event section of the first job to add the code displayed below.

<!-- Start Executing different Job -->
<api:set attr="job.JobName"        value="Extraction_job_2"/> 
<api:set attr="job.ExecutionType"  value="Run"/> 
<api:set attr="job.WaitForResults" value="true"/> 
<api:call op="syncExecuteJob" in="job"/><!-- Start Executing different Job -->
<api:set attr="job.JobName"        value="Extraction_job_2"/> 
<api:set attr="job.ExecutionType"  value="Run"/> 
<api:set attr="job.WaitForResults" value="true"/> 
<api:call op="syncExecuteJob" in="job"/>

See the illustration below.



logo
Get The Help You Need
logo
Learning RPA - Automation Courses
logo
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2023 UiPath. All rights reserved.