Subscribe

UiPath Process Mining

The UiPath Process Mining Guide

Editing transformations

Introduction

Data transformations are used to transform input data into data suitable for Process Mining. The transformations in UiPath Process Mining are written as dbt projects.

This pages gives an introduction to dbt. For more detailed information, see the official dbt documentation.

Folder structure

The transformations of a process app consist of a dbt project. Below is a description of the contents of a dbt project folder.

Folder/fileContains
dbt_packages\the pm_utils package and its macros.
logs\logs created when running dbt.
macros\custom macros.
models\.sql files that define the transformations.
models\schema\.yml files that define tests on the data.
seeds\.csv files with configuration settings.
dbt_project.ymlthe settings of the dbt project.

See the illustration below.

298

Data transformations

The data transformations are defined in .sql files in the models\ directory. The data transformations are organized in a standard set of sub directories:

  • 1_input,
  • 2_entities,
  • 3_events,
  • 4_event_logs,
  • 5_business_logic.

See Structure of transformations.

The .sql files are written in Jinja SQL, which allows you to insert Jinja statements inside plain SQL queries. When dbt runs all .sql files, each .sql file results in a new view or table in the database.
Typically, the .sql files have the following structure:

  1. With statements: One or more with statements to include the required sub tables.
    • {{ ref(‘My_table) }} refers to table defined by another .sql file.
    • {{ source(var("schema_sources"), 'My_table') }} refers to an input table.
  2. Main query: The query that defines the new table.
  3. Final query: Typically a query like Select * from table is used at the end. This makes it easy to make sub-selections while debugging.
with Table_A as ( 
  select * from {{ ref('Table_A') }} 
), 
Table_B as ( 
  select 
    Table_A."Field_1" as "Alias_of_1", 
    Table_A."Field_2", 
    Table_A."Field_3" 
  from Table_A 
) 
select * from Table_B

For more tips on how to write transformations effectively, see Tips for writing SQL

Adding source tables

To add a new source table to the dbt project, it must be listed in models\schema\sources.yml. This way, other models can refer to it by using {{ source(var("schema_sources"), 'My_table_raw') }}. See the illustration below for an example.

624

🚧

Note:

Each new source table must be listed in sources.yml.

📘

Note:

The suffix _raw is added to source tables table names when loading data. For example, a table called my_table should be referred to as my_table_raw.

For more detailed information, see the official dbt documentation on Sources.

Data output

The data transformations must output the data model that is required by the corresponding app; each expected table and field must be present.
Practically, this means that the tables in the models\5_business_logic should not be deleted. Also, the output fields in the corresponding queries should not be removed.

If you want to add new fields to your process app, you can use the custom fields that are available for the process app. Map the fields in the transformations to the custom fields to have them available in the output. Make sure the custom fields are named in the output as described in the data model of the process app.

Macros

Macros make it easy to reuse common SQL constructions. For detailed information, see the official dbt documentation on Jinja macros.

pm_utils

The UiPath Process Mining app templates come with a dbt package called pm_utils. This contains a set of macros that are typically used in Process Mining transformations.
For more info about the pm_utils macros, see ProcessMining-pm-utils.
Below is an example of Jinja code calling the pm_utils.optional() macro.

561

Seeds

Seeds are csv files that are used to add data tables to your transformations. For detailed information, see the official dbt documentation on ninja seeds.
In Process Mining, this is typically used to make it easy to configure mappings in your transformations.

After editing seed files, these files are not automatically updated in the database immediately. To instruct dbt to load the new seed file contents into the database, run either

  • dbt seed - which will only update the seed file tables, or
  • dbt build - which will also run all models and tests.

📘

Note:

If the seed file had no data records initially, the data types in the database might not have been set correctly. To fix this, call run dbt seed --full-refresh. This will also update the set of columns in the database.

Activity configuration

The activity_configuration.csv file is used to set additional fields related to activities. activity_order is used as a tie breaker when two events are happening on the same timestamp. See the illustration below for an example.

351

Tests

The models\schema\ folder contains a set of .yml files that define tests. These validate the structure and contents of the expected data. For detailed information, see the official dbt documentation on tests.

When the transformations are run in Process Mining, only the tests in sources.yml are run on each data ingestion. This is done to check if the input data is properly formatted.

📘

Note:

When you edit transformations, make sure to update the tests accordingly. The tests can be removed if desired.

Updated about a month ago

Editing transformations


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.