Process Mining
2021.10
false
Banner background image
Process Mining
Last updated Apr 2, 2024

Cleaning input data

Cleaning input data

When data is loaded into the Basic Connector, it is possible that the dataset contains incorrect or irrelevant cases and events. The Basic Connector contains two filters that can be used to remove these cases and events, the Cases filter and Events filter.

See the illustration below.



Cases filter

The Cases filter applies to all cases in the Cases_input table and is often used to remove duplicate cases or to leave out certain case types. In the example below cases with a negative amount are filtered. The result panel shows that 15 cases will be filtered out based on this definition.


Events filter

The Events filter applies to all events in the Events_input table and is often used to leave out certain activities or to filter out events before a specific date. The Events filter always references the Cases filter to remove events where the case has been filtered out in the Cases filter. In the example below events happening before 01/01/2016 are removed. The result panel shows that this results in 72 191 events being removed.


Applying the filter

By default, the Cases filter and Events filter are applied in the join of the Cases_preprocessing and Events_preprocessing table. Because of that it is sufficient to only alter the filters themselves. The setting guarantees the preprocessing tables only contain data according to the filter definition.
Double-click on the Cases_preprocessing or Events_preprocessing table to inspect how the filter is applied.

Cases_preprocessing

The join of the Cases_preprocessing table applies the Cases filter in its where-condition. As a result, the table holds all data contained in the Cases_input table except for the records filtered out by the Cases filter. The example below shows that 15 records are excluded, which corresponds to the 15 false values in the Cases filter itself.


Events_preprocessing

The Events filter applies to all events in the Events_input table and is often used to leave out certain activities or to filter out events before a specific date. The Events filter always references the Cases filter to remove events where the case has been filtered out in the Cases filter. In the example below are events happening before 01/01/2016 removed. The result panel shows that this results in 72 191 events being removed.


Replacing attributes

Instead of having attributes in your dataset that do not exist in the Basic Connector, it is also possible that there are fields defined in AppOne, that do not directly correspond to one of the fields in your input data file. In this case, you should create an expression for this field in the Basic Connector.

In some cases you might not want to remove the entire record, but simply correct the values of the incorrect attribute.

To correct such an attribute in UiPath Process Mining, it is necessary to first make an expression that calculates the correct values and then replace the incorrect attribute with the new expression.

Correcting the attribute value

To correct the attribute, create a new expression which calculates the correct values. Create this expression in the same table the incorrect attribute originates in.

For example, the Case ID attribute is available in the Cases_preprocessing and Cases_base table, but it originates in Cases_input. Therefore the new expression to correct it also should be calculated in Cases_input.
Note: It is recommended to give the new expression the same name as the original attribute.
See illustration below for an example on how to remove the prefix CORE_ from the Case ID in the Cases_input table.


Replacing the attribute

The attributes of the tables in the Basic Connector are used in different expressions throughout the connector. Therefore, it is not possible to simply delete the incorrect attribute, but it needs to be replaced by the new expression. The steps below explain how to replace an attribute.

Note: It is important to take these steps in the table where the incorrect attribute and the new expression originate from.

Step 1: Set the availability of the new expression

To replace an attribute the availability of both attributes needs to be the same. The two Case ID attributes in the figure below have different availabilities.

Right-click on the second Case ID expression and select Availability - Public from the context menu to change the availability to Public.



Step 2: Swap UIDs

To replace the incorrect attribute in all the places where it is used in the Connector with the new expression, the UIDs of both attributes need to be swapped. By swapping the UIDs, the software replaces all references to the original attribute with references to the new expression and vice versa. To swap UIDs select both attributes, right-click and select Advanced - Swap UIDs from the context menu. See the illustration below.



Note:
  • The UID is an internal software ID and not the ID shown in the expression editor. After swapping the UIDs the name and ID of the attribute or expression will not have changed.
  • If the swapping of the UIDs is not done in the table where the original attribute and the new expression originate from, a warning is displayed and the swap is not executed in the original table. You can undo the changes by using CTRL + Z and replace the attribute in the correct table.

Step 3: Check references

To check whether the swap was successful, check the references of each of the attributes. All references that used to point to the original attribute should now point to the new expression (see example below). The incorrect attribute should only be referenced by our new expression itself. To check the references, select an attribute, right-click and select Advanced - Show references from the context menu.



Ghosts
A ghost is an attribute that became unavailable even though it is still being used in the Connector. A warning is displayed when a ghost is created. A ghost is indicated by the icon. Never delete a ghost that still has references pointing to it. Undo the changes by using CTRL+Z until the ghost is replaced by the actual attribute. Evaluate which steps went wrong during the replacement of the attribute and repeat if necessary.

Step 4: Set the availability of original attribute

If the swap was successful and the references point to the correct attributes, it is recommended to set the availability of the original attribute to Private. In this way, it cannot be used in other tables such as the Preprocessing and Base tables. See illustration below for the two Case ID attributes after the swap and the original attribute set to private.


  • Cleaning input data
  • Cases filter
  • Events filter
  • Applying the filter
  • Replacing attributes
  • Correcting the attribute value
  • Replacing the attribute

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.