- Release notes
- Getting started
- Installation
- Configuration
- Integrations
- Authentication
- Working with Apps and Discovery Accelerators
- AppOne menus and dashboards
- AppOne setup
- TemplateOne 1.0.0 menus and dashboards
- TemplateOne 1.0.0 setup
- TemplateOne menus and dashboards
- TemplateOne 2021.4.0 setup
- Purchase to Pay Discovery Accelerator menus and dashboards
- Purchase to Pay Discovery Accelerator Setup
- Order to Cash Discovery Accelerator menus and dashboards
- Order to Cash Discovery Accelerator Setup
- Basic Connector for AppOne
- SAP Connectors
- Introduction to SAP Connector
- SAP input
- Checking the data in the SAP Connector
- Adding process specific tags to the SAP Connector for AppOne
- Adding process specific Due dates to the SAP Connector for AppOne
- Adding automation estimates to the SAP Connector for AppOne
- Adding attributes to the SAP Connector for AppOne
- Adding activities to the SAP Connector for AppOne
- Adding entities to the SAP Connector for AppOne
- SAP Order to Cash Connector for AppOne
- SAP Purchase to Pay Connector for AppOne
- SAP Connector for Purchase to Pay Discovery Accelerator
- SAP Connector for Order-to-Cash Discovery Accelerator
- Superadmin
- Dashboards and charts
- Tables and table items
- Application integrity
- How to ....
- Working with SQL connectors
- Introduction to SQL connectors
- Setting up a SQL connector
- CData Sync extractions
- Running a SQL connector
- Editing transformations
- Releasing a SQL Connector
- Scheduling data extraction
- Structure of transformations
- Using SQL connectors for released apps
- Generating a cache with scripts
- Setting up a local test environment
- Separate development and production environments
- Useful resources
Process Mining
Use sharding in your applications
Sharding is an innovative solution for improving performance of your process mining applications. In short, sharding divides the data in your event log into smaller parts called “shards”. The smaller each shard is, the faster it will be.
With a shard, end-users only consider the applicable part of the data they are interested in. When a user logs in to the application, only the applicable data shard will be loaded.
Shards can be divided into two different types:
- Regular shards, which contain parts of your data at the detailed level.
- Benchmark shards, which contain an aggregated, high-level view of all your data.
Multiple techniques exist for creating regular shards as well as for benchmark shards. Regular shards can be created by splitting your data based on case attributes. Benchmark shards combine the data of all shards. Typically, the detail level of the data is reduced using pre-aggregation, filtering, or sampling.
An example attribute for sharding could be Company code, where each shard contains all cases belonging to a single company code. If you were to have 10 company codes in your dataset, every shard will then be approximately 10 times faster than the original (assuming equal splits).
See illustration below.
Besides splitting your data into separate shards, it is useful to have an overview shard containing a higher-level view of all data, a ‘benchmark shard’.
You can set this up in multiple ways:
- By pre-aggregating values or attributes: this prevents you from doing detailed analysis but allows you to still compare differences over shards.
- By lowering the level of detail by filtering out fine-grained events: this enables you to compare processes on a coarse level.
- By filtering: You can remove all event data and only keep tags and the respective cases, this way you can compare tags over multiple shards.
- By sampling: You can sample cases in your dataset to only keep part of the cases, keeping a representative sample of cases as your benchmark dataset.
You can also set up multiple benchmark shards using different methods.
You can use a single connector for your ETL, even when using sharding. You do this by setting up application modules, using one module per shard you want to create.
In your connector, add a system table with table scope set to “current user” to get the ActiveApplicationCode, which indicates the module that is currently active. You can use this attribute from the system table to create conditions for your data loading.
Example
When applying sharding using case types, set up an expression Case_Type_Shard based on the ActiveApplicationCode attribute, to determine what case type belongs to which application code. Then, in the cases_base table, you set the join condition to:
Cases_preprocessing where Case_type_Shard = Case_type
This ensures that only the cases which have a case type belonging to the current shard are passed through in your final output.
events_preprocessing
table, create a lookup expression to the cases_base
table which checks whether cases are in the selected shard.
See illustration below.
Use this expression attribute in the join condition of the events_base table with the expression:
Events_preprocessing where Case_in_shard
.
To set up your application for sharding, you also need one module per shard. These modules must have the same module codes as the ones in your connector.
Furthermore, depending on what type of benchmark shard you use, the data structure may be different for the regular shards and the benchmark shard. If this is the case, you need a separate application for the benchmark shard.
Since you are using multiple modules, you need to reload the data using a script, to make sure data of all connector modules ends up in the same dataset. This way, the application knows, based on the opened module, what part of the data to consider. See Set up Automated Data Refreshes for the script for reloading your data.