activities
latest
false
Integration Service Activities
Last updated Oct 8, 2024

Best practices

This section includes information on how to use Context grounding effectively.

Getting started

To start using Context Grounding, follow these steps first:

  1. Make sure you use Studio Web or Studio Desktop version 2024.4 or newer.
  2. In Automation Cloud, in Integration Service, establish a connection to the UiPath GenAI Activities connector.

How Context Grounding interacts with your data in the GenAI Activities

To use Context Grounding you need a dataset (e.g., a group of documents) readily available. Then, Context Grounding can interact with your data in three phases:

  1. Establish your data source for Context Grounding.
    • Use an automation or upload files directly into an Orchestrator bucket, in a shared Orchestrator folder. The Orchestrator folder must be shared, because Context Grounding is a tenant-scoped.
    • You must have Edit permissions for that folder to ensure you can upload or remove documents from the bucket. No additional permissions are required.
  2. Ingest data from your data source into Context Grounding.
  3. Query and ground prompts with your data.
    • Use Content Generation to query over documents and use information to augment or ground prompts.

Managing the ingestion pipeline

Index and Ingest

The Index and Ingest (Public Preview) activity provides an asynchronous mechanism to ingest documents that are uploaded and stored in Orchestrator buckets.

In the Index and Ingest (Public Preview) activity, under Additional properties, the Data type dropdown field allows you to specify the file type you would like to ingest:

  • PDF - to ingest PDF files (default selection).
  • CSV - to ingest CSV files.
  • JSON - to ingest JSON files.

If you add more files to your Orchestrator bucket, you need to re-run this activity to ensure the index remains up to date.

The File glob pattern field is designed to the file type you set Data type:

  • If you have one data type in your Orchestrator Bucket (e.g. all PDFs), you select PDF from the Data type dropdown list and keep the File glob pattern default value as *..
  • If you have multiple file types in your orchestrator bucket, you need to specify in both the Data type dropdown and in File glob pattern the data types you want ingested. The patterns are the same as data type, we specify the difference to help clarify the ingestion request:
    • *.pdf for PDFs.
    • *.csv for CSVs.
    • *.json for JSONs.
For example: if you upload PDFs to your Orchestrator bucket, you must run Index and Ingest (Public Preview) with your index, Data type set to PDF, and File glob pattern set to *.. Next, you upload CSV files to that same Orchestrator Bucket. You need to run Index and Ingest (Public Preview) again, select the same index, but set the Data type field to CSV and File glob pattern to *.CSV.

Follow a similar pattern when you remove documents from your data source: when you re-ingest via Index and Ingest (Public Preview), you need to specify the Data Type and File Glob to ensure the ingestion recognizes to remove that document type from the index.

Note: The ingestion and re-ingestion of CSV files may take slightly longer than other data types.

After you create an index, activities and other UiPath products can use it to query documents that are important for your use case.

Ingestion time

Ingestion is an asynchronous process. After you execute the activity, it does not mean the data is queryable immediately. Smaller ingestion payloads are processed faster. Therefore, we recommend you upload documents in batches and run this activity periodically. The duration of this process depends on the amount of data and how many other users submit ingestion requests.

After sending the ingestion request, assuming each document is less than or equal to 1MB, we typically recommend waiting:

  • <10 minutes: for 10 documents or less;
  • <30 minutes: for 50 documents or less;
  • up to 2 hours: for 50 documents or more.
Note: These are not SLAs or SLOs. The performance depends on the nature of the documents, size, and amount of concurrent requests.

Run the Index and Ingest (Public Preview) activity each time you add or delete data from your data source. This ensures parity with your data source and the embeddings used for search and RAG.

The index name may not appear in the Content Generation activity before the ingestion is completed. If you can’t find the index name in the Content Generation activity:

After running the Index and Ingest (Public Preview) activity, if you are using an existing index, select the Force Refresh button menu next to the Index field:

docs image

Common errors and resolution patterns

  • You may receive an error ("No Results Found") if the ingestion job has not completed.
  • You may receive an error ("No Schema Found") if the ingestion job has failed. In this case, wait a few minutes, retry querying, and retry ingestion.
  • You may receive an error ("Datasource Synchronization Already in Progress") which means you have an ingestion job to that index that is currently in progress. Wait a few minutes and retry ingestion.

You can run the Index and Ingest (Public Preview) activity:

  • Manually (strongly recommended).
  • Event-based: Set up an automation to run the ingestion pipeline every time a new document is added to or removed from a storage bucket. Do this if you regularly add or remove documents from buckets.
  • Time-based: Set up an automation to run the ingestion pipeline on a scheduled basis. Do this if you regularly expect a high volume of inflow/outflow of documents. For any trigger creation to work with these activities you should consider previous statements of ingestion being an asynchronous process.

Delete Index

Use the Delete Index (Public Preview) activity if you want to delete the index and embeddings, removing that as a data source that can be queried from other UiPath GenAI Activity experiences. This does not delete the documents or data in the data source (e.g. Orchestrator buckets).

Tip: For both the Index and Ingest (Public Preview) and Delete Index (Public Preview) activities, we recommend using a separate Studio project to run them separately from the Content Generation (querying/RAG) activity. This way, the asynchronous ingestion/deletion process can take place.

Querying and RAG with Context Grounding

The Content Generation activity features two options for working with documents: File resource and Existing index.

File resource

The File resource option allows you to use file variables from previous activities (e.g. a document downloaded from Google Drive) and have Context Grounding perform a just-in-time (JIT) style of RAG. This means it ingests the document into an index, searches across it, augments the prompt, and then deletes the index, so those data are not persistent.

Note:
  • The File Resource option currently supports only PDF format.
  • Keep these documents smaller (under 50 pages). Integration Service activities have a timeout window in which all of the processing above must take place.

  • If you have a PDF with scanned images, we recommend using the Document Understanding OCR option in the Extract Data activity after you have downloaded a file to extract the text from these images (as Context Grounding does not yet support images). Pass that extracted_data output into the prompt with that File Resource pointing at your downloaded file.

Existing index

The Existing index option allows you to use an index you created with the Index and Ingest (Public Preview) activity. You query across a persistent index into which you've ingested documents from your data source. You can re-use this index as many times as you like until you delete it.

We recommend using the Log Message activity after Content Generation in your workflow sequence, to input the Top Generated Text variable and see the LLM generation response after the workflow executes.

The Content Generation activity also has an output variable called Citations String (Public Preview). Use it as input in a Log Message activity to see the semantic search results used to influence the generation output. This works only for PDF and JSON data types.

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.