- Hardware and Software Requirements
- Server Installation
- Updating the License
- Deploying the UiPath Process Mining Profiler
- Deploying a Connector (.mvp)
- Updating UiPath Process Mining
- Updating a Customized Version of an App or Discovery Accelerator
- Installing a Training Environment
- Set up Single Sign-on Through Azure Active Directory
- Set up Single Sign-on Through Integrated Windows Authentication
- Adding Superadmin AD Groups
- Adding End-user AD Groups
- Two-Factor Authentication
- Introduction to AppOne
- Analyzing Data in AppOne
- Overview of Menus and Dashboards in AppOne
- Introduction to Purchase-to-Pay Discovery Accelerator
- Analyzing Data With Purchase-to-Pay Discovery Accelerator
- Overview of Menus and Dashboards
- Menu Compliance
- Deploying the Basic Connector
- Introduction to Basic Connector
- Input Tables of the Basic Connector
- Adding Tags
- Adding Automation Estimates
- Adding Due Dates
- Adding Reference Models
- Setting up Actionable Insights
- Setting Collapsible Charts
- Using the Output Dataset in AppOne
- Output Tables of the Basic Connector
- Introduction to SAP Connector
- SAP Input
- Checking the Data in the SAP Connector
- Adding Process Specific Tags to the SAP Connector for AppOne
- Adding Process Specific Due Dates to the SAP Connector for AppOne
- Adding Automation Estimates to the SAP Connector for AppOne
- Adding Attributes to the SAP Connector for AppOne
- Adding Activities to the SAP Connector for AppOne
- Adding Entities to the SAP Connector for AppOne
- HTML Panels
- Migrating Legacy Charts to New Charts
- Join Tables
- Global Tables
- Introduction to Table Items
- Display Format
- Rebrand and Restyle Apps and Discovery Accelerators
- Use Sharding in Your Applications
- Create an Anonymized Dataset
- Set up Automated Data Refreshes
- Use an Access Matrix to Enable Role-Based Access to Data
- Introduction to SQL Connectors
- Setting up a SQL Connector
- CData Sync Extractions
- Running a SQL Connector
- Editing Transformations
- Releasing a SQL Connector
- Scheduling Data Extraction
- Structure of Transformations
- Using SQL Connectors for Released Apps
- Generating a Cache With Scripts
- Setting up a Local Test Environment
- Separate Development and Production Environments
Use Sharding in Your Applications
Sharding is an innovative solution for improving performance of your process mining applications. In short, sharding divides the data in your event log into smaller parts called “shards”. The smaller each shard is, the faster it will be.
With a shard, end-users only consider the applicable part of the data they are interested in. When a user logs in to the application, only the applicable data shard will be loaded.
Types of Shards
Shards can be divided into two different types:
- Regular shards, which contain parts of your data at the detailed level.
- Benchmark shards, which contain an aggregated, high-level view of all your data.
Multiple techniques exist for creating regular shards as well as for benchmark shards. Regular shards can be created by splitting your data based on case attributes. Benchmark shards combine the data of all shards. Typically, the detail level of the data is reduced using pre-aggregation, filtering, or sampling.
An example attribute for sharding could be Company code, where each shard contains all cases belonging to a single company code. If you were to have 10 company codes in your dataset, every shard will then be approximately 10 times faster than the original (assuming equal splits).
See illustration below.
Besides splitting your data into separate shards, it is useful to have an overview shard containing a higher-level view of all data, a ‘benchmark shard’.
You can set this up in multiple ways:
- By pre-aggregating values or attributes: this prevents you from doing detailed analysis but allows you to still compare differences over shards.
- By lowering the level of detail by filtering out fine-grained events: this enables you to compare processes on a coarse level.
- By filtering: You can remove all event data and only keep tags and the respective cases, this way you can compare tags over multiple shards.
- By sampling: You can sample cases in your dataset to only keep part of the cases, keeping a representative sample of cases as your benchmark dataset.
You can also set up multiple benchmark shards using different methods.
Setting up Your Connector
You can use a single connector for your ETL, even when using sharding. You do this by setting up application modules, using one module per shard you want to create.
In your connector, add a system table with table scope set to “current user” to get the ActiveApplicationCode, which indicates the module that is currently active. You can use this attribute from the system table to create conditions for your data loading.
When applying sharding using case types, set up an expression Case_Type_Shard based on the ActiveApplicationCode attribute, to determine what case type belongs to which application code. Then, in the cases_base table, you set the join condition to:
Cases_preprocessing where Case_type_Shard = Case_type
This ensures that only the cases which have a case type belonging to the current shard are passed through in your final output.
You also need to make sure only events belonging to cases in the current shard are in the output. Therefore, in the
events_preprocessing table, create a lookup expression to the
cases_base table which checks whether cases are in the selected shard.
See illustration below.
Use this expression attribute in the join condition of the events_base table with the expression:
Events_preprocessing where Case_in_shard.
The benchmark shard is also set up using the
ActiveApplicationCode attribute. The filtering depends on what type of benchmark shard you want to use and is similar to what is described above
for the regular shards.
Setting up Your Application
To set up your application for sharding, you also need one module per shard. These modules must have the same module codes as the ones in your connector.
Furthermore, depending on what type of benchmark shard you use, the data structure may be different for the regular shards and the benchmark shard. If this is the case, you need a separate application for the benchmark shard.
Reloading Your Data
Since you are using multiple modules, you need to reload the data using a script, to make sure data of all connector modules ends up in the same dataset. This way, the application knows, based on the opened module, what part of the data to consider. See Set up Automated Data Refreshes for the script for reloading your data.