communications-mining
latest
false
UiPath logo, featuring letters U and I in white
Communications Mining User Guide
Last updated Nov 27, 2024

Taxonomy design best practice

Key taxonomy elements

  • Number of labels: Typical datasets have ca. 50-100 labels but this will vary depending on the objectives for a dataset (an effective use case can have much fewer than 50). There is an imposed limit of 200 labels for a dataset, as beyond this the taxonomy becomes very difficult to manage and train for model trainers, and leads to reduced performance.

  • Label names: Label names should be concise and descriptive, as they are used as a training input by the Generative Annotation feature to speed up and improve the training process. They are always editable, but to ensure they can be displayed effectively in the platform UI, there is a character limit of 64 characters for any given label (including its levels of hierarchy)
  • Label descriptions: Add natural language descriptions to your labels as they are also used as a training input by the Generative Annotation feature for automatic training, and to ensure annotating consistency amongst model trainers. They can also provide helpful context to others viewing a dataset for analytical purposes.

Structuring your taxonomy

We recommend following these best practices to structure your taxonomy properly and ensure high model performance:

  • Objectives alignment: Make sure each label serves a specific business purpose and is aligned to your defined objectives (see here). If your dataset is intended for automation, many of your labels should align to the specific requests required for downstream processing. If your dataset is intended for analytics (or both), you should also have additional labels that cover concepts like issue types, root causes, and quality of service issues such as chaser messages, escalations and disputes.
  • Distinct: It’s important that each label is specific in what it's trying to capture and doesn’t overlap with other labels.
  • Specific: Avoid using broad, vague, or confused concepts as they are more likely to perform badly and less likely to provide business value. Try to split broad labels out into multiple distinct labels, if possible. It’s better to go too specific with labels initially (i.e. more levels of hierarchy) and merge them up later if needed, as opposed to having to break down very broad labels manually.
  • Identifiable: Ensure each label is clearly identifiable from the text of the messages that it’s applied to.
  • Parent label: Use a parent label if you expect to have a significant number of other similar concepts related to this broader topic.
  • Child label: Make sure that every label nested under another label is a subset of that label.
  • Hierarchy levels: In general, try not to add more than four levels of hierarchy as the model becomes increasingly complex to train.
  • Uninformative: Create some non-value adding labels, e.g. thank-you emails, so you can tell the platform what is / isn’t important to analyse.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.