- Release notes
- Before you begin
- Getting started
- Managing access
- Working with process apps
- Creating process apps
- Loading data
- Customizing process apps
- Data transformations
- TemplateOne app template
- Purchase to Pay app template
- Order to Cash app template
- Basic troubleshooting guide
Process Mining
Editing transformations
Data transformations are used to transform input data into data suitable for Process Mining. The transformations in Process Mining are written as dbt projects.
This pages gives an introduction to dbt. For more detailed information, see the official dbt documentation.
pm_utils
. This pm-utils
package contains utility functions and macros for Process Mining dbt projects. For more info about the pm_utils
, see ProcessMining-pm-utils.
pm-utils
package by adding new functions.
pm-utils
package is released, you are advised to update the version used in your transformations, to make sure that you make use of
the latest functions and macros of the pm-utils
package.
pm-utils
package in the Releases panel of the ProcessMining-pm-utils.
pm-utils
version in your transformations.
-
Download the source code (zip) from the release of
pm-utils
. -
Extract the
zip
file and rename to folder to pm_utils. -
Export transformations from the inline Data transformations editor and extract the files.
-
Replace the pm_utils folder from the exported transformations with the new pm_utils folder.
-
Zip the contents of the transformations again and import them in the Data transformations editor.
The transformations of a process app consist of a dbt project. Below is a description of the contents of a dbt project folder.
Folder/file |
Contains |
---|---|
|
the
pm_utils package and its macros.
|
|
logs created when running dbt. |
|
custom macros. |
|
.sql files that define the transformations.
|
|
.yml files that define tests on the data.
|
|
.csv files with configuration settings.
|
|
the settings of the dbtproject. |
See the illustration below.
.sql
files in the models\
directory. The data transformations are organized in a standard set of sub directories:
1_input
,2_entities
,3_events
,4_event_logs
,5_business_logic
.
.sql
files are written in Jinja SQL, which allows you to insert Jinja statements inside plain SQL queries. When dbt runs all .sql
files, each .sql
file results in a new view or table in the database.
.sql
files have the following structure:
-
With statements: One or more with statements to include the required sub tables.
{{ ref(‘My_table) }}
refers to table defined by another .sql file.{{ source(var("schema_sources"), 'My_table') }}
refers to an input table.
- Main query: The query that defines the new table.
-
Final query: Typically a query like
Select * from table
is used at the end. This makes it easy to make sub-selections while debugging.
For more tips on how to write transformations effectively, see Tips for writing SQL
models\schema\sources.yml
. This way, other models can refer to it by using {{ source(var("schema_sources"), 'My_table_raw') }}
. See the illustration below for an example.
sources.yml
.
The suffix _raw is added to source tables table names when loading data. For example, a table called my_table should be referred to as my_table_raw.
For more detailed information, see the official dbt documentation on Sources.
The data transformations must output the data model that is required by the corresponding app; each expected table and field must be present.
models\5_business_logic
should not be deleted. Also, the output fields in the corresponding queries should not be removed.
If you want to add new fields to your process app, you can use the custom fields that are available for the process app. Map the fields in the transformations to the custom fields to have them available in the output. Make sure the custom fields are named in the output as described in the data model of the process app.
dbt docs
commands to generate a documentation site for your dbt project and open it in your default browser. The documentation site
also contains a Lineage Graph that provides an entity relationship diagram with an graphical representation of the linkage
between each data table in your project.
dbt docs
.
Macros make it easy to reuse common SQL constructions. For detailed information, see the official dbt documentation on Jinja macros.
pm-utils
package contains a set of macros that are typically used in Process Mining transformations. For more info about the pm_utils
macros, see ProcessMining-pm-utils.
pm_utils.optional()
macro.
csv
files that are used to add data tables to your transformations. For detailed information, see the official dbt documentation on jinja seeds.
In Process Mining, this is typically used to make it easy to configure mappings in your transformations.
After editing seed files, these files are not automatically updated in the database immediately. To instruct dbt to load the new seed file contents into the database, run either
dbt seed
- which will only update the seed file tables, or-
dbt build
- which will also run all models and tests.Note: If the seed file had no data records initially, the data types in the database might not have been set correctly. To fix this, callrun dbt seed --full-refresh
. This will also update the set of columns in the database.
models\schema\
folder contains a set of .yml
files that define tests. These validate the structure and contents of the expected data. For detailed information, see the
official dbt documentation on tests.
sources.yml
are run on each data ingestion. This is done to check if the input data is properly formatted.