Process Mining

2021.10

false

Release notes
- 2022
- 2021
Getting started
- About Process Mining
- Introduction to UiPath Process Mining
- User roles
- UiPath Process Mining components
- Platform architecture
  - Server architecture
  - Integration options
- From data to dashboard
- App and Discovery Accelerator development
Installation
- Hardware and software requirements
- Server installation
- Updating the license
- Deploying Apps and Discovery Accelerators
- Deploying the UiPath Process Mining Profiler
- Deploying a Connector (.mvp)
- Updating UiPath Process Mining
- Updating a customized version of an App or Discovery Accelerator
- Installing a training environment
Configuration
- Server configuration
- Using a Git repository
- Creating accounts for developers
  - Two-Factor authentication
- Backup
- About telemetry
- Set up Audit Logs
  - Category details
Integrations
- Set up integration with UiPath Automation Hub
- Set up Actionable Insights
Authentication
- Set up single sign-on through Azure Active Directory
- Set up single sign-on through Integrated Windows Authentication
- Set up single sign-on through SAML for Microsoft Active Directory
  - Configuring ADFS
- Adding Superadmin AD Groups
- Adding End-user AD Groups
- Set up LDAP
- Two-Factor Authentication
- Set up a Credential store
  - Setting up an Azure Key Vault Credential store
  - Use a Credential store
Working with Apps and Discovery Accelerators
- Working with charts
- Working with Process graphs
- Sending automation ideas to UiPath Automation Hub
- Filters
- Favorites
- Export
- Selecting the preferred language
AppOne menus and dashboards
- Introduction to AppOne
- Analyzing data in AppOne
- Overview of menus and dashboards in AppOne
- Menu Overview
- Menu Process
- Menu Timing
  - Timing - Timing
  - Timing - Due Dates
- Menu Conformance
- Menu Users
- Menu Details
AppOne setup
- Input tables of AppOne
TemplateOne 1.0.0 menus and dashboards
- TemplateOne menus and dashboards
- Menu Overview
- Menu Analysis
- Menu Efficiency
  - Efficiency - Automation
- Menu Compliance
  - Compliance - Tags
  - Compliance - Due Dates
- Menu Details
TemplateOne 1.0.0 setup
- Getting Started with TemplateOne
- Steps to roll-out TemplateOne 1.0.0
- Input tables of TemplateOne 1.0.0
- Adding custom attributes
- Configuring the context bar
TemplateOne menus and fashboards
- TemplateOne menus and dashboards
- Menu Overview
- Menu Analysis
- Menu Efficiency
  - Efficiency - Automation
- Menu Compliance
  - Compliance - Tags
  - Compliance - Due Dates
- Menu Details
TemplateOne 2021.4.0 setup
- Getting started with TemplateOne
- Steps to roll-out TemplateOne
- Input tables of TemplateOne
  - Input tables of TemplateOne 2021.4.0
- Loading data into TemplateOne
Purchase to Pay Discovery Accelerator menus and dashboards
- Introduction to Purchase-to-Pay Discovery Accelerator
- Analyzing data with Purchase-to-Pay Discovery Accelerator
- Overview of menus and dashboards
- Menu Overview
  - Overview - Procurement
  - Overview - Accounts Payable
- Menu Procurement
- Menu Accounts Payable
- Menu Efficiency
- Menu Compliance
- Menu Details
Purchase to Pay Discovery Accelerator Setup
- Input tables of the Purchase-to-Pay Discovery Accelerator
  - Input tables of Purchase-to-Pay Discovery Accelerator 21.10
  - Input tables of Purchase-to-Pay Discovery Accelerator V. 21.4
- Adding automation estimates
Order to Cash Discovery Accelerator menus and dashboards
- Introduction to Order-to-Cash Discovery Accelerator
- Overview of menus and dashboards
- Menu Overview
- Menu Analysis
  - Analysis - End to End
  - Analysis - Deviations
- Menu Efficiency
  - Efficiency - Automation
  - Efficiency - Customers
- Menu Details
  - Details - End to end
Order to Cash Discovery Accelerator Setup
- Input tables of the Order-to-Cash Discovery Accelerator
  - Input tables of the Order-to-Cash Discovery Accelerator V21.4.1
  - Input tables of the Order-to-Cash Discovery Accelerator 21.4
- Adding automation estimates
Basic Connector for AppOne
- Deploying the Basic Connector
- Introduction to Basic Connector
- Input tables of the Basic Connector
- Loading data
  - Mapping attributes
  - Cleaning input data
- Adding tags
- Adding automation estimates
- Adding Due dates
- Adding Reference models
- Setting up Actionable Insights
- Setting collapsible charts
- Using the output dataset in AppOne
- Output tables of the Basic Connector
SAP Connectors
- Introduction to SAP Connector
- Loading data in the SAP Connector for AppOne
- SAP input
- Checking the data in the SAP Connector
- Adding process specific tags to the SAP Connector for AppOne
- Adding process specific Due dates to the SAP Connector for AppOne
- Adding automation estimates to the SAP Connector for AppOne
- Adding attributes to the SAP Connector for AppOne
- Adding activities to the SAP Connector for AppOne
- Adding entities to the SAP Connector for AppOne
SAP Order to Cash Connector for AppOne
- Order-to-Cash Process in UiPath Process Mining
  - Roles in the Order-to-Cash process
- Introduction to SAP Order-to-Cash Connector for AppOne
  - Entities
  - Activities
- Input Data of the SAP Order-to-Cash Connector for AppOne
- Other settings
- Optional attributes
- Order-to-Cash tags
- Order-to-Cash Due dates
- Order-to-Cash Reference models
SAP Purchase to Pay Connector for AppOne
- Purchase-to-Pay process in UiPath Process Mining
  - Roles in the Purchase-to-Pay process
- Introduction to SAP Purchase-to-Pay Connector for AppOne
  - Entities
  - Activities
- Input data of the SAP Purchase-to-Pay Connector for AppOne
- Other settings
- Purchase-to-Pay tags
- Purchase-to-Pay Due dates
SAP Connector for Purchase to Pay Discovery Accelerator
- Introduction to SAP Connector for Purchase-to-Pay Discovery Accelerator
  - Entities
  - Activities
- Input Data of the SAP Connector for Purchase-to-Pay Discovery Accelerator
- Configuring the SAP Connector for Purchase-to-Pay Discovery Accelerator
- Configuring the activity code
SAP Connector for Order-to-Cash Discovery Accelerator
- Introduction to SAP Connector for Order-to-Cash Discovery Accelerator
  - Entities
  - Activities
- Input data of the SAP Connector for Order-to-Cash Discovery Accelerator
- Configuring the SAP Connector for Order-to-Cash Discovery Accelerator
Superadmin
- The Superadmin page
- Collaborative development
  - Workspace conflicts
  - Commit
- Creating releases
- Viewing the branch history
- Creating Apps
- Modules
Dashboards and charts
- Creating dashboards
- Adding charts on a dashboard
- HTML panels
- Process graphs
- Legacy charts
  - Adding a compare period filter to a dashboard
- Migrating legacy charts to new charts
Tables and table items
- Connection string tables
  - Connection string types
  - Table scope
- Join tables
- Global tables
- Introduction to table items
- Datasource attributes
  - Attribute selectors
  - Adding new attributes to an App or Discovery Accelerator
- Metrics
- Filters
- Display format
- Expressions
- Maps
- Actions
Application integrity
- Maintain Application integrity
- Application issues
  - Finding Application issues
  - Solving Application issues
- Application Profiling
  - Profiling aalysis
  - Example analysis
How to ....
- Rebrand and restyle Apps and Discovery Accelerators
- Translate apps
- Use sharding in your applications
- Use generic script datasources
  - Example: Creating a Python Script
  - Example: Creating an R Script
- Create an anonymized dataset
- Set up native SAP extraction
  - Installing the Z_XTRACT_IS_TABLE Function Module on Your SAP System
- Set up automated data refreshes
- Use an access matrix to enable role-based access to data
Working with SQL connectors
- Introduction to SQL connectors
- Setting up a SQL connector
- CData Sync extractions
- Running a SQL connector
- Editing transformations
- Releasing a SQL Connector
- Scheduling data extraction
- Structure of transformations
- Using SQL connectors for released apps
- Generating a cache with scripts
- Setting up a local test environment
- Separate development and production environments
Useful resources
- Troubleshooting
- Performance
- Security

Process Mining

Last updated Apr 2, 2024

Data Volume

Introduction

The amount of data will always be in a direct trade-off with performance. Process mining is inherently obsessed with details to construct the process graphs.

However, having all these unique timestamps impacts the performance. In general, there are theoretical limits that all process mining tools and all in-memory tools approach.

Types of Users

We make a clear distinction between the performance of the data used for an Application and the Connector. Although they make use of the same platform, there are some differences, i.e. what is acceptable for the users (developers versus end users) and the type of actions performed.

Large amounts of data can have both an impact on the Connector and Application, but all can be solved in the Connector.

Data Volume

The performance end-users will experience is directly related to the data volume. The data volume is determined by the number of rows in the biggest tables. In general, only the number of rows determine the performance end users experience. The number of columns is only a factor when the data is loaded from the database.

Processes with about 5.000.000 (5M) cases and up to about 50.000.000 (50M) events per process would be ideal. With more cases and events parsing the data and showing the visualization will take longer.

The UiPath Process Mining platform will continue to work, however, when large amounts of data are inserted, the reaction speed may drop. It is recommended to check the data amount beforehand. If it exceeds the above numbers, it is advised to consider optimizing or limiting the dataset.

Level of Detail

A higher level of detail will take a higher response time which impacts the performance.

The exact tradeoff between the amount of data, the level of detail, and the waiting time needs to be discussed with the end users. Sometimes historical data can be very important, but often only the last few years are needed.

Another factor is the unique values you have in your columns. UiPath Process Mining uses a proprietary method to reduce the size of the *.mvn files to a minimum. This works well for values that are similar. A lot of unique values for an attribute will also impact performance e.g. event detail.

Solutions

There are two main solution directions for dealing with large data volumes:

optimization;
data minimization.

Optimization involves the adjustments Superadmins can make to make the dashboards render faster, which can be achieved by tailoring the application settings to the specific dataset (see Application Design for more information).

This section describes data minimization, which are the different techniques you can employ to reduce the data visible to the end user, tailored to the specific business question.

The techniques described here can exist alongside each other or can even be combined to leverage the benefits of multiple techniques. In addition, you may keep an application without data minimization alongside minimized applications because the level of detail might sometimes be required for specific analyses where slower performance is acceptable.

Data Scoping

Limiting the number of records that will show up in the tour dataset will not only improve the performance of the application, it will also improve the comprehensibility of the process and in turn, improve acceptance by the business.

The scoping of the data can be done in the Connector.

One of the options for scoping is to limit the time frame to look by filtering out dates or periods. For example, you could limit the timeframe from 10 years to one year. Or from 1 year to one month. See the illustration below.

A limited amount of activities is advised, especially in the start of any process mining effort. From there you can build up as the expertise starts to ramp up.

Below is a guideline for the range of activities:

Range (nr. of activities)	Description
5-20	Preferred range when starting with process mining. Simple process to give insight information.
20-50	Expert range. Expanding with clear variants.
50-100	Most useful if there are clear variants. This means somewhat related processes, but primarily on their own.
100+	Advised is to split up into subprocesses.

Note: Filtering out activities will simplify your process and make it more comprehensible. Be aware that you also may lose information or details.

Below are some suggestions for filtering data:

Unrelated activities: activities that are not directly impacting the process could be filtered out.
Secondary activities: some activities, i.e. a change activity, can happen anywhere in the process. These significantly blow up a number of variants.
Minimally occurring events: events that occur only a few times in your dataset could be filtered out.
Smaller process: only analyze a subprocess.
Grouping activities: some activities in your dataset may be more like small tasks, which together represent an activity that makes more sense to the business. Grouping them will require some logic in the connector and may result in overlapping activities.
If possible, within the performance of the Connector, use the Connector to filter out activities. In this way, any changes can be reverted easily, or activities can be added back. Avoid filtering out activities in the data extraction or data loading.

Remove Outliers

If there is one case with a lot of events (outlier), it will impact some expressions which calculate aggregates on the event level. The from/to dashboard item filter is impacted by this and can be time-consuming to calculate if you have these outliers. It is recommended to filter out these cases in the Connector to take them out of the dataset.

Note: This does impact metrics. You should only remove outliers in accordance with the business user.

Focus on Outliers

In other instances, the outliers may be the key area to focus on. If your process is going well or you adopt Six Sigma methodologies, you want to focus on the things going wrong. Instead of showing all the cases going right, you only show the cases going wrong.

See the illustration below.

Reducing the Size of the Dataset

In the Connector, you can remove attributes that have a lot of detail. For example, long strings in the Event Detail attribute.

When finished developing a lot of unused attributes may end up in your dataset. It is recommended to only set the availability of the attributes that are used in the output dataset of the Connector to the public. Set the availability of other attributes to private.

Pre-aggregation

Pre-aggregation is a technique that is employed by many BI tools to gain insights into large data volumes. It involves aggregating data over specific attributes to reduce the number of records in a dataset. In BI this would typically be summing the value of each supplier, so only have one record for each supplier.

See the illustration below.

Process mining requires more configuration, but a starting point is to only aggregate on process variants. For each variant you would have one case record and a related number of events. This can significantly reduce the data volumes.

To show correct results you would also have to show how many records each variant represents, for the event ends you could use a median duration of each event. Aggregating only using variants might be too high so it would be wise to check most common filters used, e.g. a combination of variants, case type and month of the case end (to show trends over time).

However, adding attributes has a quadratic effect on the number of records so this requires a careful balance between performance and use case.

Pre-aggregation is most applicable for an overview of your process and spotting general trends.

Sampling

Sampling is a technique where you take a percentage of the cases and their events happening in a specific period. You can for instance set that only 10% of all cases and their events are shown. In this way you still have exceptions or outliers since each case has a similar chance of showing up in the dataset.

See illustration below.

Cascaded Sampling

Cascaded sampling is a technique where the sampling percentage drops over time with a certain percentage. An example of this shows 100% of last week’s data, 90% of two weeks ago, 80% of three weeks ago, and so on.

Data Sharding

Data sharding is a technique of the data scoping solution, which allows organizations to split up the data into multiple datasets, rather than just slicing off one part. This setup does require additional configuration since the application needs to be split up by using modules and multiple smaller dataset need to be exported from the connector.

With data sharding, the original dataset is divided into multiple shards. The smaller each shard is, the faster it will be. When a user logs in to the application, only the applicable data shard will be loaded.

A typical unit for sharding would be “Company code” or “Department”. For example, in the case of 50 company codes, each shard will contain one company code, and essentially be about 50 times faster than the original dataset.

See the illustration below for an overview of sharding.