Communications Mining
latest
false
Banner background image
Communications Mining User Guide
Last updated Apr 18, 2024

Uploading a CSV file into a source

User permissions required: 'Sources admin' AND 'Edit messages'.

Note: This article demonstrates how to upload data from a CSV file into an existing data source. To understand how to first create a data source via the GUI, see here.
Key steps
Note: If updating existing messages in a source, changing message properties (e.g. message text, sent_at timestamp and 'to' or 'from') other than user properties, will cause entity annotations in associated datasets to be lost. It's highly recommended to pin the latest model version in associated datasets before doing so.

To upload data from a CSV file into a data source, navigate to the Sources page (via the admin console, accessed via the cog in the top right of your page) and locate the source you'd like to upload data into.

Click the upload icon in the top right-hand corner of the data source card (as shown below).

Data source card

Then click 'Select file' and choose the CSV file you wish to upload.

The selected file must meet the following criteria:

  • The file should contain headers on the first line and be delimited by commas or tabs
  • A minimum of three columns are required: the message text contents (the message), a timestamp, and a unique ID that identifies the message
  • All text fields in your CSV file should be surrounded by double quotes
  • The file must be encoded as either UTF-8, UTF-16, or UTF-32 (the platform automatically detects which one)
  • The CSV file should be 64 MiB or less. If you have a larger file, you can still upload it by splitting it into multiple files, each less than 64 MiB
CSV upload page - Step 1

If your file meets the above criteria, you can then configure and upload the messages in the next step:

CSV upload page - Step 2

Select the required columns from each of the dropdown lists containing the column headers detected within the CSV file:

  • ID column:
    • This must be a column containing a unique ID that can identify the message
    • The message IDs can only contain ASCII alphanumeric characters (A-Z a-z 0-9) and punctuation (except /)
    • Note: If there are existing messages in the source with the same ID, they will be updated to match the contents of the new file
  • Message column:
    • This is simply the column that contains the message text that you want to analyse in the platform
  • Timestamp column:
    • This is column containing the date and time the message was recorded
    • The timestamp format is flexible and will be inferred automatically by the platform

If you have data containing subject lines, threads, or participants (typically seen in cases or email threads), you can also upload these additional columns within your CSV file:

  • Subject Column
    • Choose which column contains the message Subject
  • Sender Column
    • Choose which column contains the Sender
  • To Column
    • Choose which column contains the Recipient(s). Multiple recipients should be semicolon separated.
  • Cc Column
    • Choose which column contains the Cc'd Recipient(s). Multiple recipients should be semicolon separated
  • Thread ID Column
    • Choose the column that contains the message Thread ID
    • A thread ID is what ties together different messages to the same thread

Sender/To/CC format:

  • The following conditions in the sender/to/cc fields will trigger errors:
    • Exceeds maximum number of recipients (max 2048 recipients per thread)
    • Sender or recipient exceeds maximum character limit (max 512 characters per recipient)
    • Two or more semicolons are found in a row (e.g. - the following is incorrectly formatted: john@email.com ; beth@email.com)
  • Although the platform will strip out any white space before or after a recipient, it will not do any additional data cleansing.
    • Example formats you may want your data in (not an exhaustive list):
      • Example 1 - Robert Bog <rob.bog@gmail.com>; John Smith <john.smith@gmail.com>
      • Example 2 - rob.bog@gmail.com ;john.smith@gmail.com
      • Example 3 - rob.bog@gmail.com ; john.smith@gmail.com
  • The platform will delimit the different recipients by the semicolon (;)
  • Before uploading your data, please ensure the emails are formatted in an appropriate format
  • Please note that in a typical threaded use case (e.g.: emails), there should only be 1 sender in each 'sender' cell

Timestamp format:

  • If your chosen timestamp format is ambiguous for the order of days / months / years (e.g. 01/02/03 10:10), you can suggest the correct interpretation:
    • 2nd of January 2003 - None
    • 1st of February 2003 - Day first
    • 3rd of February 2001 - Year first
    • 2nd of March 2001 - Day first + Year first
  • To avoid ambiguity, it is recommend to supply timestamps in the RFC 3339 format if possible (e.g. 2020-01-31T12:34:56Z for UTC or with a timezone: 2020-08-031T11:20:60-08:00)

Then select the additional user properties you want to upload with the messages. User properties are contextual metadata associated with each message that are filterable in the platform. These are also potentially used by the machine learning models in the platform. There are two types, either string or number:

  • String user properties are categorical metadata (typical examples include IDs, countries, counterparties, etc.)
  • Number user properties are numeric metadata (typical examples include NPS, email statistics, amounts, etc.)
Note:

If your file contains an NPS score as a user property, this must be included as a number property and named 'NPS' only, in order to trigger native NPS charts to load in the platform.

Once you've selected all of the user properties, click 'Upload'.

You'll then be prompted to inspect the uploaded messages in a dataset that contains the source you uploaded data into. If the source is not associated with any datasets yet, you can create a new one to check that the upload is as expected.

Note:

If you made a mistake when selecting the user properties you can re-upload the same file, and the platform will use the column ID as the identifier to overwrite the existing messages and properties (this will not affect any labels applied to existing messages).

Troubleshooting

Hopefully your upload will run smoothly, but it's possible that you'll encounter an issue during the upload process and see an error message. We've outline some of them below and why they occur, to help you resolve or avoid them.

In the error messages below, {something} maps to contextual information about where the error occurred. Additionally, the way we refer to a position in the file is standardised as:

StringExpands to:
{position}record {row-number} on line {line-number} column {column-number} (byte {byte-number})
The title of the error message is displayed along with a description, as shown below:


Here are some possible error messages users may encounter when uploading CSV files:

Error KindError MessageDescription
Not Enough ColumnsThe CSV file only contains {number-columns} column(s), but at least 3 are needed (text, timestamp and id)The uploaded CSV doesn't contain at least 3 columns or the platform has mis-detected the encoding of the file.
Invalid EncodingThe file contains invalid characters (encoding detected as {detected-encoding})The file is not correctly encoded as UTF-8 / UTF-16 / UTF-32 (the platform automatically detects the format of the file)
Invalid Headerstring:ti:er' does not match'(^delimiter|id|message|timestamp |timestamp_default_utc_offset |timestamp_day_first|timestamp_year_first\\Z)|(^(?P<property_type>number|string):(?P<name>\\w(?:[\\w]{0,30}\\w)?)\\Z)'If a column header is an invalid name for a user property, the platform returns the default message for when the schema of a request is invalid. Check that each column header is a valid format for its purpose. Max length for a column header is 32 alphanumeric characters
Unequal Row LengthsThe CSV contains unequal row lengths. Message {position} has {number} fields, but the previous record has {number} fields.The CSV contains rows with different numbers of cells in them or that are inconsistent with the number of headers.
Id formatInvalid message id for {record}. Ids can only consist of ASCII alphanumeric characters and punctuation (except '/'). Cell value: {cell-value} This error occurs when an Id field consists of invalid characters as described in the error message.
Id lengthId is too long for message {record}. It has {number} bytes, expected at most 1024This error occurs when an id field is longer than the maximum allowed length (1024 characters)
Timestamp FormatIncorrectly formatted timestamp in message {position}: {timestamp-error-message}. Cell value: {cell-value}This error occurs when a timestamp field could not be parsed.
Message LengthMessage is too long for message {position}. It has {number} bytes, expected at most 65536This error occurs when a message field is longer than the maximum allowed length (65536 characters).
Number Property FormatIncorrectly formatted number in message {position}: {number-error-message}. Cell value: {cell-value}This error occurs when a number user property field could not be parsed. The platform should allow any format that can reasonably be decoded as a number.
Property LengthProperty is too long for message {position}. It has {number} bytes, expected at most 4096This error occurs when a user property field is longer than the maximum allowed length (4096 characters).
Unknown ErrorUnknown CSV error: {underlying-error-message}The above list is not completely exhaustive - if an unknown error occurs, retry the upload.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.