Process Mining
2021.10
false
Banner background image
Process Mining
Last updated Apr 2, 2024

Data Loading

Introduction

Data Loading Into the Connector

Data loading refers to the time required for loading in new data in the Connector. This is determined by the number of columns when reading in from database.

Some types of data are faster to load in than others. In a broad sense, the order is the following.

  1. ODBC: this also depends on the driver and the database.
  2. Flat files: csv’s.
  3. Excel: these files contain overhead for the use in Excel, which makes them slower to read in. If possible, use text files instead of Excel files. Text files are much faster.

Multi-file script is quite slow to parse all the different flat files together and should be avoided if possible. Also avoid API’s for loading massive amounts of data.

Data Loading Into the Application

Data can be loaded in the following ways:

  • when the application is started (live data);
  • as the result of a scheduled data run (cached data);
  • a combination of live and cached data (incremental load).

Live Data

In general, live data is a lot slower, especially if there is a lot of data. Live data also needs continuous access to the data, which can be a problem during production hours.

As a general guideline, it is recommended to keep live data below 100.000 events. Actual performance heavily depends on the data and used data sources.

It is possible to retrieve live data based on the value of a filter. If the filter is changed, the new data is requested. Performance must be seriously considered for these kinds of use cases.

Live tables are loaded when the user logs in and/or changes a filter control. Live tables often lead to performance problems. It is recommended to use cached tables whenever possible.

Cached Data

For cached data, the startup time of the application is independent on the number of columns. When data is pre-calculated and cached, it can be loaded directly from the cache when it is requested. Extracting data from source systems can be time-consuming. It is recommended to schedule the cache updates, for example outside production hours.

Besides the extraction of data, the data is also transformed to the UiPath Process Mining internal format and all calculations that do not depend on user input, are cached.

For calculations that depend on user input, the initial state is cached. When the user changes a control or filter that changes the calculation, the calculation is performed again. Keeping these recalculations to a minimum is very important in good application design.

Incremental Load

By default, UiPath Process Mining does not incrementally load data. Because mutations often take place on the items in the ERP systems, archiving the data is often not a desired approach. Therefore, all data is loaded from the system to ensure we have the latest changes in our data model.

Incremental data loading theoretically can be set up by application developers. This requires sufficient information in the database for determining what data is new and what needs to be queries. Performance needs to be considered carefully. We only recommend using incremental data loading when this is absolutely necessary.

A more suitable alternative is to run incremental loads from the source system into a data lake/warehouse using specialized tools, then querying the data lake/warehouse from UiPath Process Mining. This ensures a low impact on the source system and shares the gains of incremental loads with the entire organisation, rather than specially for UiPath Process Mining.

External Scripts

In UiPath Process Mining you can load in data via scripts using or example Python or R. These scripts will call an external program to run and this output can be read in again. UiPath Process Mining provides the support on the interface between our platform and the script. UiPath Process Mining does not support on issues with the actual script which may cause a long runtime of the external tool.

Solutions

Drivers

Always make sure that you have installed the latest versions of MSSQL ODBC drivers for Windows Server 2016.

Debug Module

Sometimes it is not possible to reduce the data to be read in, for example. when the input data cannot be filtered yet. With a large input in your Connector, the reaction times may be slow. In order to speed up developing, you can add modules to your application.

You can use the module code to ensure that only in one module the data is actually being read in, while the other module do not load data and can be used to make changes to your data model. In this way changes are affected without having to wait on the data to initialize.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.