Process Mining
latest
false
Datasets - Automation Cloud latest
logo
Process Mining
Last updated Nov 28, 2023

Datasets

Introduction

Along with a connector, several datasets must be delivered. These datasets are used for the execution of transformations or to validate the connector. The following datasets are required:

  • Sample dataset
  • Performance dataset

Sample Dataset

A sample dataset must be provided that can be used when the connector is used for the first time. It serves as an example to demonstrate the capabilities of the connector as well as help to verify that the setup of the connector is done correctly. Additionally, the sample dataset will be used to validate the transformations in the validation process.

General Requirements

  • The sample data contains a separate .csv file for each table of the source system used in the connector.
  • Each file is named <tablename>.csv.
  • Each file has a header line.
  • The delimiter tab is used.
  • Fields should either have no quotes or double quotes.

Quality Requirements

  • Naming of fields should match the naming of fields when exported from the source system.
  • All available fields contain data.
  • All entities and events occur in the data.
  • All tags and due dates occur in the data.

    Note: This only applies to TemplateOne connectors.
  • The data should not contain any ‘Cases’ without ‘Events’ and vice versa.
  • The data should not contain any ‘Tags’ without ‘Cases’.

    Note: This only applies to TemplateOne connectors.
  • The data should not contain any ‘Due dates’ without ‘Events’.

    Note: This only applies to TemplateOne connectors.
  • The contents of the dataset represent a realistic process.
  • In case real data is used, the data is anonymized and you give consent for this data to be used by UiPath.
  • The datasets must pass the dbt tests defined for the transformations as described in Tests.

Size Requirements

  • The data size is less than 1MB.

    Note: There is no minimum data size. For example, if five cases can fulfill the quality requirements, that would be sufficient.

Performance Datasets

In Process Mining, the data volume on which the transformations are applied is often quite substantial. To be able to validate the performance of the transformations in the connector, larger datasets are needed where the largest table has the following number of records:

  • A dataset that contains 50 million records.
  • A dataset that contains 500 million records.

The transformation that is expected to result in the largest table depends on the type of connector.

  • For a connector for a Discovery Accelerator, the number of records in the dataset is determined by the number of events in the Events_base table.
  • For a connector for TemplateOne, the number of records in the dataset is determined by the number of events in the Event_log_base table.

Performance datasets do not have to meet the same quality requirements as the sample data, although the dataset should resemble real-life data as much as possible. (For example, the dataset should contain multiple cases, each having some events). Additionally, the following requirements must be met:

  • The dataset meets the same General requirements Datasets as stated for the sample dataset, apart from the name.
  • The datasets must pass the dbt tests defined for the transformations as described in Tests.

    Note: Performance datasets are not part of the repository containing all files of the connector; storage account details will be provided by UiPath. The data must be stored as a single .zip file, following the naming convention <repository_name>_<dataset_size>.zip.
logo
Get The Help You Need
logo
Learning RPA - Automation Courses
logo
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2023 UiPath. All rights reserved.