- Getting started
- Balance
- Clusters
- Concept drift
- Coverage
- Datasets
- General fields (previously entities)
- Labels (predictions, confidence levels, hierarchy, etc.)
- Models
- Streams
- Model Rating
- Projects
- Precision
- Recall
- Reviewed and unreviewed messages
- Sources
- Taxonomies
- Training
- True and false positive and negative predictions
- Validation
- Messages
- Administration
- Manage sources and datasets
- Understanding the data structure and permissions
- Create a data source in the GUI
- Uploading a CSV file into a source
- Create a new dataset
- Multilingual sources and datasets
- Enabling sentiment on a dataset
- Amend a dataset's settings
- Delete messages via the UI
- Delete a dataset
- Delete a source
- Export a dataset
- Using Exchange Integrations
- Preparing data for .CSV upload
- Model training and maintenance
- Understanding labels, general fields and metadata
- Label hierarchy and best practice
- Defining your taxonomy objectives
- Analytics vs. automation use cases
- Turning your objectives into labels
- Building your taxonomy structure
- Taxonomy design best practice
- Importing your taxonomy
- Overview of the model training process
- Generative Annotation (NEW)
- Understanding the status of your dataset
- Model training and annotating best practice
- Training with label sentiment analysis enabled
- Train
- Introduction to Refine
- Precision and recall explained
- Precision and recall
- How does Validation work?
- Understanding and improving model performance
- Why might a label have low average precision?
- Training using Check label and Missed label
- Training using Teach label (Refine)
- Training using Search (Refine)
- Understanding and increasing coverage
- Improving Balance and using Rebalance
- When to stop training your model
- Using general fields
- Generative extraction
- Using analytics and monitoring
- Automations and Communications Mining
- Licensing information
- FAQs and more
Using general fields
A guide to setting up and training General Fields in the platform.
It is important to define the key data points (i.e. fields) that you want to extract from your Communications Mining data. These typically facilitate downstream automation, but can also be useful for analytics - particularly in assessing the potential success rate and benefit of automation opportunities.
- General fields are fields that you may want to extract, that can be found across multiple different topics/labels in a dataset.
- Extraction fields are the fields conditioned and created on a specific label. In other words, it is tied to a specific label that you want to automate.
Check out the official documentation, to find out more about the Generative extraction and General vs extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.
Ultimately, general field predictions, combined with labels, can facilitate automation by providing the structured data points needed to complete a specific task or process. It’s much more time-efficient to train general fields in your dataset in conjunction with labels, rather than focusing on one and then the other (i.e., training general fields after training a full taxonomy of labels).
Check out the official documentation, to find out more about the Generative extraction and General vs extraction fields. If Generative Extraction is not available in your region, continue to use general fields as normal. The rest of this section provides guidance on how to use general fields.
What are general fields?
General Fields are additional elements of structured data which can be extracted from within the messages in your dataset. General Fields include data points such as monetary quantities, dates, currency codes, email addresses, URLs, as well as many other industry specific categories (see below for an example).
The platform is able to predict most general fields (except those trained from scratch) as soon as they are enabled, as it can identify them based on their typical, or in some instances very specific, format and a training set of similar general fields.
Like labels, users are able to accept or reject general fields that are correctly or incorrectly predicted, enhancing the model’s ability to identify them in future.
Types of general fields
There are currently two main types of general fields:
- Pre-trained general fields that are typically based on a set of standard or custom-defined rules - e.g. Monetary Quantity, URL, and Date
- General fields trained from scratch by a user (like they would train labels) that are machine learning based
Trainable versus non-trainable general fields
All general fields are either trainable' by nature (general fields trained from scratch), or can be made trainable when they're enabled (all other general field kinds).
Trainable general fields are those that will update live in the platform based on training provided by users. For more detail on training general fields, see here.
If you enable training on a pre-trained general field that is typically based on a set of standard or custom-defined rules, you can refine the platform's understanding of that general field within the parameters of those rules. Essentially, further training on these will reduce the scope of what the platform can consider that general field, but not increase it.
This is because many of these general fields, like dates (e.g. 'tomorrow') and monetary quantities (e.g. £20), need to be normalised into a structured data format for downstream systems. Also for general fields like ISINs or CUSIPs, these must have a set format, so the platform should not be taught to predict anything that does not conform to their defined formats.
When any trainable general fields are assigned, the platform looks at both the text of the general field, as well as the context of the general field within the rest of the communication, i.e. what is happening before and after the general field value (in the same paragraph, and the one above and below). It learns to better predict the general field based on the values themselves, as well as how the value appears within the context of the communication.
If a pre-trained general field is not set as trainable, you can still accept or reject the general field predictions you see in your dataset. These are updated and refined offline using this in-platform feedback provided by users. It is helpful for you to accept or reject these general fields when reviewing messages. To learn more on how to enable general fields on a dataset, check the Enabling, disabling, updating, and creating general fields page.
When configuring general field types, you can select from one of the following pre-built options, via the template option when selecting the data type for the field type.
General Field Type | Description |
---|---|
An email address. | |
Currency | A currency code, e.g. GBP, CHF, or USD. |
URL | A uniform resource locator (i.e. web address). |
SEDOL | A financial security identifier, short for Stock Exchange Daily Official List, which is 7 characters in length. |
BIC Code | A Business Identifier Code (BIC) is an international standard under ISO 9362 for routing business transactions and identifying business parties. The BIC code is 8 or 11 characters in length. |
LEI | A Legal General Field Identifier (LEI) is a unique global identifier of legal general fields participating in financial transactions. LEI is formatted as a 20-character alpha-numeric code. |
ISIN | An International Securities Identification Number (ISIN) uniquely identifies a financial security. ISIN is a 12-character alpha-numeric code. |
Mark-to-market (MTM or M2M) | Mark-to-market refers to the fair value of an asset or liability. Mark-to-market is based on the current market price, the price of similar assets and liabilities, or on another objectively obsessed fair value. |
CUSIP | A CUSIP is a 9-digit number or a 9-character alpha-numeric code that identifies a North American financial security for the purposes of facilitating clearing and settlement of trades. |
User permissions required: 'View Sources' AND 'Modify Datasets' OR 'Datasets Admin'.
Enabling general fields on a new dataset
To enable general fields on a new dataset that you want to create, you simply need to select them during the setup process.
Click the + button in the box shown below and you will be presented with a drop-down menu of all of the general fields that you are able to enable for that dataset. Simply click all of the general fields you want to enable before creating the dataset. If you add any in error, you can click the ‘X’ icon next to the general field name to remove it.
To understand more about how to create a new dataset, see here.
Enabling, updating, and disabling general fields on an existing dataset
If you want to enable, update or disable general fields for an existing dataset, you can do so from the settings tab on the top navigation bar, and then selecting the Labels and extraction fields tab.
Enabling general fields:
To enable existing general fields, click inside the General Fields box, and select the general fields you want to enable from the drop down menu. Once you're happy with your selections, select Update General Fields (as shown below).
These general fields will have their settings pre-selected for you. You can then update them, including making them trainable, as shown below.
Updating general fields:
To update an enabled general field, click the general field in the general field box as shown in the above images and the 'Edit general field' modal (below) will appear.
Here you can update the base general field, the title of the general field and the API name (these concepts are described in detail below), as well as making the general field 'trainable'.
If you have previously reviewed general fields for a general field kind that was not set to 'trainable', this information is still stored.
Disabling general fields:
To remove any selected general fields, simply click the 'X' icon next to the general field name, and then click Update General Fields.
If you remove a general field and click Update General Fields, this will also remove the training data for that general field for this dataset. If you chose to re-enable the general field, you will need to train it again.
If you make a mistake while updating the general fields, click 'Reset' before you click Update General fields and your changes will not be applied.
Creating new general fields
The above sections covered how to enable and update existing pre-trained general fields for both new and existing datasets. In each instance, for either a new or existing dataset, you can also create new general fields.
Newly created general fields can be based on an existing pre-trained general field or can be trained from scratch (like a new label).
You can do this by clicking the '+' icon in the general field box, either in the 'Create dataset' flow or in the dataset settings page (as shown above).
This will bring up the 'Add a new general field' modal as shown below.
Here you can set the field types, title, and API name, as well as selecting whether the general field is trainable or not (these can be updated later as shown above).
When you've filled in each of the fields (explained below), simply click 'Create'.
Field types
- This will serve as the initial state for your new general field, and the dropdown will contain a list of all the pre-trained
general fields available to you
- For example, if you select 'Date' as your base general field, all of the general fields predicted for this kind will be dates, and you could then train the platform to only recognise specific dates
-
If you want to train a general field entirely from scratch, you can select 'None - Train from scratch', and then you essentially start with a blank canvas when training the general field. The platform's predictions for this general field will be entirely based on the training examples you provide
General field title
- The general field title is the name of the general field that will appear in the UI of the platform
API name
- The API name of the general field is what will be returned via the API when it provides predictions for messages
- The API name cannot contain any spaces or punctuation except for dash ( - ) and underscore ( _ )
User permissions required: View Sources AND View general fields.
Just as you can for labels, you can filter messages by whether they have general fields predicted or assigned, both in Explore and Reports.
You can apply any combination of AND, ANY OF and NOT when applying more than one general field filter. These filters can give you much greater flexibility when training and interpreting your data, and can provide much deeper insights on what's happening in your communication channels.
Here's some of the things you can now do when filtering by general field predictions:
- Apply multiple general field filters at once, in both Explore and Reports
- Filter to messages that have one of the number of selected general field predicted (i.e., ANY OF the General field X AND General field Y AND ...)
- Filter to messages that have multiple different general fields predicted (i.e., general field X AND general field Y AND ...)
- Filter to messages that do not have certain general fields predicted (i.e., NOT General field Y)
- Search for general fields containing specific search terms, whilst having general field filters applied
All of the general fields you have enabled on your dataset will appear as shown below in the filter bar. Assigning general fields is covered in detail in the Reviewing and applying general fields.
There are now two ways to apply general field filters, and you can use them in combination with each other to create the right type of query.
The default state is shown above, whereby no filter is applied and all messages will be shown (unless another filter is applied).
To update the general field filter, use the buttons explained below. They change colour when selected:
Show messages containing any annotated general fields. | |
Show messages predicted to contain a general field |
If you want to filter to messages that have any annotated general fields or predicted to contain a general field, use the buttons at the top (as shown above). If you want to filter to messages with specific annotated or predicted general fields, hover over the general field in question and the same two buttons will appear to the right.
If you want to filter to either an assigned or predicted general field, select the name of the general field, and it shows messages with either one of them.
To remove your selection, select the button again, and to remove multiple selections, select All. You can also select Clear All at the top of the filter bar, but this will clear every filter you have selected, not just general field filters.
The taxonomy of general fields functions as a normal filter bar, and allows you to select multiple general fields at once with a single click for each.
Selecting multiple general fields from the list creates an ANY OF type query.
If you selected General field A, General field B and General field C in the General field bar, this creates a Show me messages with General field A, General field B, or General field C predicted query.
When filtering to specific general fields, you can make multiple selections. For instance, you could filter to see messages that have an address line general field assigned OR a city general field predicted (as shown below).
The second filter option is the + Add General field filter button above the general field bar.
This enables a dropdown general field bar that allows you to select more complex filters, such as excluding certain general fields from consideration.
From this dropdown, you can select multiple general fields to include or exclude by clicking the name of the general field (for assigned and predicted), or the individual buttons (including minus for where this general field is neither assigned nor predicted).
The result looks like in this example, which returns messages predicted to have the Invoice ID general field, but not the Prod ID general field assigned or predicted:
You can select + Add General field Filter, multiple times to add additional layers to your query. Two separate general field filters create an AND type query, whilst multiple general fields selected in the same general field filter create an ANY OF type query.
In the example below, multiple general field filters have been applied individually. This creates a filter that will return messages predicted to have any of the three general fields in the first filter, but that also have the Policy Number general field predicted, and do not have the UK Postcode general field predicted or assigned.
A helpful tip is that by selecting the & sign in an individual filter containing multiple general fields, you can automatically split them out into individual filters. This would change the query from ANY OF (i.e. any of these general fields predicted) to AND (i.e. all of these general fields predicted).
It's possible to combine filters from both the general field bar, and individually added general field filters. Filters applied in the general field bar are treated as an AND query with any individually applied general field filters.
For example, in the image below, this combined query would return any messages that had either ORDER ID or PROD ID predicted.
Combine general field filter using general field bar and individually added general field filters.
What these new filters also mean, is that you can now apply general field filters and sort by a specific general field for a training mode.
User permissions required: 'View Sources' AND 'Review and label'.
Predicted general fields appear as colour highlighted text, such as in the first line of the message below, with a different colour appearing for each different general field type. Once a general field has been confirmed by a user, by either manually applying it or accepting a prediction, the general field will appear as highlighted text with a bold, darker outline as shown below.
If a paragraph has had general fields assigned, dismissed, or applied, it will appear highlighted in grey, as shown in the body of the message below.
When reviewing trainable general fields, it's important to remember that the platform will learn from both the general field values that you assign, as well as the context of where they appear within the communications, i.e. the other language that's used around the values themselves.
The platform will consider the context of the language in the same paragraph as the general field value, as well as the single paragraphs (denoted by a new separated line) directly before and after the paragraph that the general field sits in.
Please Note: For general fields that are not set to 'trainable', the platform's predictions are based entirely on the rules defined within the platform for that general field. This can be beneficial for when a general field absolutely has to follow a set format for a downstream automation, with any incorrect values causing a failure or exception.
When the platform predicts which general fields apply to a communication, it assigns each prediction a confidence score (%) to show how confident it is that the general field applies to the highlighted span of text. You can view a general field’s confidence score by hovering over the general field.
This confidence score is also made available via the API so that it can inform automated actions taken downstream.
Once general fields are enabled (see here), the platform will automatically start predicting them within the messages throughout your dataset. Users can then accept the predictions that are correct or reject them where they are incorrect. Each of these actions sends training signals that will be used to improve the platform’s understanding of that general field.
For the pre-trained general fields that are trained offline (e.g. Monetary quantity, URL, etc.), it is more important from an improvement perspective for users to reject or correct wrong predictions than it is for them to accept correct predictions.
For the general fields that train live in the platform, it is equally important to accept correct predictions as well as reject incorrect predictions. You do not, however, need to keep accepting many correct examples of each unique general field for these kinds (e.g. Example Bank Ltd. is a unique organisation general field) if you aren't finding incorrectly predicted ones.
The key caveat to this if that if you review any general field in a paragraph, you need to review all of the other general fields in that paragraph.
To review a general field prediction, hover the mouse over the prediction and the general field review modal will appear, as shown in the example below. To accept it, click 'Confirm', to reject it, click 'Dismiss'.
General fields and labels can be trained independently of each other. Reviewing labels for a message does not mean you have to review the general fields in that same message. It is, however, good practice to do both at the same time, as the most efficient use of your time whilst model training.
Please Note: It's very important when training general fields to follow the best practices explained below - particularly regarding not partially annotating paragraphs.
To understand how well the platform is able to predict each general field enabled for a dataset (particularly the trainable ones), see here.
It’s important that you reject incorrect general field predictions, but if the highlighted text was in fact a different general field (this would be more common for date-related general fields) that you apply the correct one afterwards (see below on how to apply general fields).
To apply a general field to some text where the platform may not have predicted it, users simply need to highlight the section of test like you would if you were going to copy it.
A dropdown menu will appear, as shown below, containing all of the general fields that you have enabled for your dataset. Simply click the correct one to apply it, or press the corresponding keyboard shortcut.
The default keyboard shortcut for each general field is the letter is starts with. If more than one general field starts with the same letter, one will be assigned at random to the other.
Once a general field has been applied, it will be highlighted in colour with a bold outline (see below). Each general field type will have its own specific colour.
A value for a given general field type cannot be split across multiple paragraphs. The full value must be contained within a paragraph for it to be extracted as one general field value.
There are two very important best practices to remember when accepting, rejecting or applying general fields within messages:
1. Don't split words
It’s important not to split words – the highlighted general field should cover the entire word (or several) in question, not just part of it (see the incorrect example on the left below, and the correct application on the right)
2. Don't partially annotate paragraphs
When annotating, if a user assigns one label to a message, they should apply ALL the labels that could apply to that message, otherwise you teach the model that those other labels should not apply. For general fields, the same is true, except general fields are reviewed or applied at the paragraph level, rather than the whole message.
Paragraphs in a message are separated by new lines. The subject line of an email message is considered its own single paragraph.
Make sure to review or apply all of the general fields within a paragraph across all general field kinds if you review or apply one of them. Applying, accepting or rejecting general fields in a paragraph means that the paragraph is treated as ‘reviewed’ by the platform from a general field perspective. Therefore, it’s important to accept or reject ALL of the predictions in that paragraph.
The example below shows the different paragraphs that have been reviewed within the email message.
The message shown below shows the same example where the user has not accepted or rejected all of the general field predictions in a single paragraph. This is incorrect, as the model will falsely treat the monetary quantity general field as an incorrect prediction.
The platform displays validation statistics, warnings and recommended actions for enabled general fields in the Validation page, much like it does for every label in your taxonomy.
To see these, navigate to the Validation page and select the General fields tab at the top, as shown in the image below.
The process in which the platform validates its ability to correctly predict general fields is very similar to how it does it for labels.
Messages are split (80:20) into a training set and a test set (determined randomly by the message ID of each message) when they are first added to the dataset. Any general fields that have been assigned (predictions that were accepted or corrected) will fall into the training set or the test set, based on whichever set the message that they're in was assigned to originally.
As there can sometimes be a very large number of general fields in one message and no guarantee whether a message is in the training set or the test set, you may see a large disparity between the number of general fields in each set.
There may also be instances where all of the assigned general fields fall into the train set. As at least one example is required in the test set to calculate the validation scores, this general field would require more assigned examples until some were present in the test set.
The individual precision and recall statistics for each general field with sufficient training data are calculated in a very similar way to that of labels:
Precision = No. of matching general fields / No. of predicted general fields
Recall = No. of matching general fields / No. of actual general fields
A 'matching general field' is where the platform has predicted the general field exactly (i.e. no partial matches)
The F1 Score is simply the harmonic mean of both precision and recall.
It's worth noting that the precision and recall stats shown in this page are most useful for the general fields that are trainable live in the platform (shown in the second column above), as all of the general fields reviewed for these general field kinds will directly impact the platform's ability to predict those general fields.
Hence accepting correct general fields and correcting or rejecting wrong general fields should be done wherever possible.
For general fields that are pre-trained via template field types, in order for the validation statistics to provide an accurate reflection of performance, users would need to ensure they accept a considerable amount of correct predictions, as well as correcting wrong ones.
If they were only to correct wrong predictions, the train and test sets would be artificially full of only the instances where the platform has struggled to predict a general field, and not those where it is better able to predict them. As correcting wrong predictions for these general fields does not lead to a real-time update of these general fields (they are updated periodically offline), the validation statistics may not change for some time and could be artificially low.
Accepting lots of the correct predictions may not always be convenient, as these general fields are predicted correctly far more often than not. But if the majority of the predictions are correct for these general fields, it's likely that you may not need worry about their precision and recall stats in the Validation page.
The summary stats (average precision, average recall and average F1 score) are simply averages of each of the individual general field scores.
Like with labels, only general fields that have sufficient training data are included in the average scores. Those that do not have sufficient training data to be included have a warning icon next to their name.
The General fields Validation page shows the average general field performance statistics, as well as a chart showing the average F1 score of each general field versus their training set size. The chart also flags general fields that have amber or red performance warnings.
The general field performance statistics shown are:
- Average F1 Score: Average of F1 scores across all general fields with sufficient data to accurately estimate performance. This score weighs recall and precision equally. A model with a high F1 score produces fewer false positives and negatives.
- Average Precision: Average of precision scores across all general fields with sufficient data to accurately estimate performance. A model with high precision produces fewer false positives.
- Average Recall: Average of recall scores across all general fields with sufficient data to accurately estimate performance. A model with high recall produces fewer false negatives.
The general field performance chart shown in the Metrics tab of the Validation page (see above) gives an immediate visual indication of how each individual general field is performing.
For a general field to appear on this chart, it must have at least 20 pinned examples present in the training set used by the platform during validation. To ensure that this happens, users should make sure they provide a minimum of 25 (often more) pinned examples per general field from 25 different messages.
Each general field will be plotted as one of three colours, based on the model's understanding of how the general field is performing. Below, we explain what these mean:
General field performance indicators:
- Those general fields plotted as blue on the chart have a satisfactory performance level. This is based on numerous contributing factors, including number and variety of examples and average precision for that general field
- General fields plotted as amber have slightly less than satisfactory performance. They may have relatively low average precisionornot quite enough training examples. These general fields require a bit of training / correction to improve their performance
- General fields plotted as red are poorly performing general fields. They may have very low average precision or not enough training examples. These general fields may require considerably more training / correction to bring their performance up to a satisfactory level
Users can select individual general fields from the general field filter bar (or by clicking the general field's plot on the All general fields chart) in order to see the general field's performance statistics.
The specific general field view will also show any performance warnings and recommended next best action suggestions to help improve its performance.
User permissions required: Review and annotate.
Like training labels, training general fields is the process by which a user teaches the platform which general fields apply on a given message using various training modes.
Like with labels, the ‘Teach’, ’Check’, and ’Missed’ modes are available to help train and improve the performance of general fields and can be accessed either 1) on the Explore page using the training dropdown, or 2) by following the recommended actions on the General fields tab of the Validation page.
If a specific general field has a performance warning, the platform recommends the next best action that it thinks will help address that warning, listed in order of priority. This will be shown when you select a specific general field from the taxonomy or the All general field chart.
The next best actions suggestions act as links that you can click to take you direct to the training view that the platform suggests in order to improve the general field's performance. The suggestions are intelligently ordered with the highest priority action to improve the general field listed first.
This is the most important tool to help you understand the performance of your general fields, and should regularly be used as a guide when trying to improve general field performance.
The following table summarises when the platform recommends each general field training mode:
Teach General field | Check General field | Missed General field |
- Show predictions for a label where the model is most confused if it applies or not - For training general fields on unreviewed messages |
- Shows messages where the platform thinks the general field may have been misapplied - For training general fields on reviewed messages to try to find and correct any inconsistencies |
- Shows messages that the platform thinks may be missing the selected general field - For training general fields on reviewed messages to try to find and correct any inconsistencies |
Using Teach General field boosts general field performance, because the model is being given new information on messages it is unsure about, as opposed to ones that it already has highly confident predictions for.
The platform recommends Teach General Fields when:
- There is a performance warning next to a general field (as seen below – when the min. 25 examples has not been provided)
- The F1 score on a given general field is low
- There may not always be obvious context within the text for a general field, or there is lots of variation within the general field values for a given type
Using check general field helps identify inconsistencies in the reviewed set, while improving the model's understanding of the general field, by ensuring that the model has correct and consistent examples to make predictions. This will improve the recall of a general field.
The platform recommends Check General Fields when:
- There is low recall, but high precision
- The predictions the platform makes are very accurate, but a lot of the time where the general field has been applied, it doesn’t catch these examples
(For more details on calculations for general field validation, please see here)
Using missed general field helps find examples in the reviewed set that should have the selected general field but do not. It will also help identify partially annotated messages which can be detrimental to the model's ability to predict a general field. This will improve the precision of a general field and ensure the model has correct and consistent examples to make predictions from.
The platform recommends Missed General Field when:
- There is high recall, but low precision
- We’re incorrectly predicting general fields a lot, but when we do predict them correctly -we catch many of the examples that should be there
For more details on calculations for general field validation, check the Validation for general fields page.
Permissions required: Modify Datasets.
Use custom Regex general fields to extract and format spans of text that have a known repetitive structure, such as IDs or reference numbers.
This is a useful option for simple, structured general fields with little variation. In case of general fields with significant variation and where the context has a big influence on predictions, a machine-learning based general field is the right choice. You can use combinations of the two in any dataset within Communications Mining.
A broader Regex (i.e., set of rules to define the general field) can also be used as the base of a custom general field. This combines the rules with contextual, machine learning based refinement through training within Communications Mining to create sophisticated custom general fields. This provides the most optimal performance as well as the necessary restrictions on values extracted for automation.
A Custom Regex general field is made up of a field type with the Regex data type, which in turn has one or more custom Regex Templates. Each template expresses one way to extract (and format) the general field.
Combined together, these templates offer a flexible and powerful way to cover multiple representations of the same general field type.
A template is made of two parts:
- The regex (regular expression), which describes the constraints that need to be met by a span of text to be extracted as a general field.
- The formatting, which expresses how to normalise the extracted string into a more standard format.
ID\
d{}
will show:
The Custom Regex Template can be tested on text to ensure that it behaves as expected. Any general field that would be extracted with the Template will be shown in a list, with its value, as well as the position of the start and end characters.
\d{4}
and the formatting ID-{$}
, the following test string will show one extraction:
The regex is the pattern used to extract general fields in the text. Check the syntax documentation.
Named capture groups can be used to identify a specific section of the extracted string for subsequent formatting. The names of the capture groups should be unique across all templates, and should only contain lowercase letters or digits.
Formatting can be provided to post-process the extracted general field.
By default, no formatting is applied and the string returned by the platform will be the string extracted by the regex. However, if needed, more complex transformations can be defined, using the following rules.
$
symbol. Note that the $
symbol by itself represents the full regex match.
{
and }
braces.
ID-
then the regex and the formatting would be:
ID-1234567
&
symbol.
Regex | (?P<id1>\b\d{3}\b)|(?P<id2>\b\d{4}\b) |
Formatting | {$id1 & "-" & $id2} |
Text | The first id is 123 and the second one is 4567 |
General Field returned by the platform | 123-4567 |
Some functions can also be used in the formatting to transform the extracted string. The names of the functions and their signatures are inspired by Excel.
Converts all characters in the extracted span to uppercase:
Regex | \w{3} |
Formatting | {upper($)} |
Text | abc |
General Field returned by the platform | ABC |
Converts all characters in the extracted span to lowercase:
Regex | \w{3} |
Formatting | {lower($)} |
Text | AbC |
General Field returned by the platform | abc |
Capitalises the extracted span:
Regex | \w+\s\w+ |
Formatting | {proper($)} |
Text | albert EINSTEIN |
General Field returned by the platform | Albert Einstein |
Pads the extracted span up to a given size with a given character.
Function arguments:
- The text containing the characters to be padded
- Size of the padded string
- Character to be used for padding
Regex | \d{2,5} |
Formatting | {pad($, 5, "0")} |
Text | 123 |
General Field returned by the platform | 00123 |
Replaces characters with other characters.
Function arguments:
- The text containing the characters to be substituted
- What characters to replace
- What the old characters should be replaced with
Regex | ab |
Formatting | {substitute($, "a", "12")} |
Text | ab |
General Field returned by the platform | 12b |
Returns the first n characters from the span.
Function arguments:
- The text containing the characters to be extracted
- The number of characters to return
Regex | \w{4} |
Formatting | {left($, 2)} |
Text | ABCD |
General Field returned by the platform | AB |
Returns the last n characters from the span.
Function arguments:
- The text containing the characters to be extracted
- The number of characters to return
Regex | \w{4} |
Formatting | {right($, 2)} |
Text | ABCD |
General Field returned by the platform | CD |
Returns n characters after the specified position from the span.
Function arguments:
- The text containing the characters to be extracted
- The position of the first character to return
- The number of characters to return
Regex | \w{5} |
Formatting | {mid($, 2, 3)} |
Text | ABCDE |
General Field returned by the platform | BCD |
- Defining and setting up your fields
- Understanding general fields
- What pre-built templates are available for general fields?
- Standard template field types for general fields
- Enabling, disabling, updating and creating general fields
- General field filtering
- Applying advanced prediction filters
- General field Bar
- Add general field filter
- Combining general field bar filters and added general field filters
- Combining general field filters and sorting by general field for training
- Reviewing and applying general fields
- Identifying general field predictions
- How does the platform make general field predictions for trainable general fields?
- General field confidence scores
- Accepting and rejecting general field predictions
- Applying general fields
- Best Practice
- Validation for general fields
- Introduction
- How does general field validation work?
- How are the scores calculated?
- Trainable general fields
- Pre-trained general fields
- What do the summary statistics means?
- Metrics
- Understanding general field performance
- Individual general field performance
- Improving general field performance
- Overview
- General field recommended actions
- General field training modes
- Using Teach General field
- Using Check General Fields
- Using Missed General Field
- Building custom regex general fields
- What are custom Regex general fields?
- Custom Regex Template
- Type-ahead validation
- Extraction preview
- Regex
- Formatting
- Variables
- String Operations
- Functions
- Upper
- Lower
- Proper
- Pad
- Substitute
- Left
- Right
- Mid