Process Mining - Tips for writing SQL

process-mining

latest

false

Process Mining user guide

Tips for writing SQL

Best practices and tips for writing SQL transformations in Process Mining, covering union field alignment, dbt usage, and common patterns.

Introduction

This page contains tips and tricks for writing SQL transformations.

General

In unions, the names, and order of fields must exactly match. It may be necessary to create empty fields on parts of the union to get all the fields aligned, with the select statement null as "Field_X".
When writing dbt for SQL Server, all Jinja statements are compiled into SQL code, regardless if they are inside SQL comments or not. For example, in the following -- {{ ref('Table1') }}, the Jinja will be compiled into SQL code.
Do not round numbers, this can lead to inconsistencies. The platform will do this automatically whenever a display value requires this.

Transformation Structure

In SQL, not all transformations can be computed in the same table. The reason for this may be aggregates or properties that cannot be expressed in a single select statement. You can create supporting tables for this purpose. Creating a supporting table in the database allows you to use the model in multiple transformations. If it is not needed to reuse the model, the supporting transformation can also be added as a preprocessing query in the existing table.

To make a distinction between supporting transformations and the other transformations, you can group them in a separate sub directory.

Performance

To make your query execution faster:

Avoid select distinct where it is also possible to build an aggregate and just take one record using a where clause.
Use union all instead of union. Using union all, records from tables are concatenated, while union removes duplicates.
If you are working on a large dataset, you can limit the data you are working with during development.
All models (except for the models in the 1_input directory) are materialized as a table by dbt. This is the default setting for all Process Mining Connectors. For more information, refer to the dbt documentation on Materializations.
When generating an id field, use the id() macro available in pm-utils. Examples can be found in the devkit-connector.

SQL style guide

The following style guide was used to develop the Process Mining app templates. It is recommended to follow these guidelines for consistency and readability.

Write SQL commands and functions in lower case.
Use the same level of indentation for select, from, where, join, etc., to understand the structure of the model more easily.
- Use indentation for the usage of functions if this improves readability. For example, use indentation for each statement in a case function.
Use consistent naming conventions for tables and fields to prevent SQL errors indicating that tables or fields do not exist in the database. Adhere to the following guidelines:
- Tables and fields start with a capital.
- Use an underscore _ between separate words in tables and fields.
- All fields have quotes.
- Tables do not have quotes. This improves readability in combination with fields having quotes.
All fields are prefixed with the table they originate from, to make the model easier to understand and extend.
Commas used for the separation of fields are placed at the end of the line.
Use inline comments when certain constructions need to be explained. When this is done, make sure the comment adds value.
For readability and maintainability, use explicit select statements in the transformation part of the query and do not use select *. Especially for unions, using the select * can yield errors.

On this page

Introduction
General
Transformation Structure
Performance
SQL style guide

Was this page helpful?

PREVIOUSStructure of transformations

NEXTExporting and importing transformations

Introduction​

General​

Transformation Structure​

Performance​