activities
latest
false
Integration Service Activities
Last updated Oct 8, 2024

Image Analysis

Description

Generate completion with a custom prompt that can contain static text, variables, arguments, and also an image file or a publicly accessible image URL.

Project compatibility

Windows | Cross-platform

Configuration

  • Connection ID - The connection established in Integration Service. Access the dropdown menu to choose, add, or manage connections.

  • Model name - The name or ID of the vision model or deployment to use for the completion. This activity defaults to the model with the highest quality observed output. However, you can select a different model based on desired outputs and testing. Switching models can impact the output. All available models are vision-equipped, meaning they can also review an image. If only submitting a text prompt, you can use Content Generation instead.
  • Prompt - The user prompt for the chat completion request. This field supports String type input.
  • Image type - The type of image to send along with a message if image analysis is needed. Select one of the following options: File, Public URL.
    • File - The image file to be used for the completion request. This field supports IResource type input. This field is displayed if Image type is set to File.
    • Public URL - The publicly accessible URL of the image to send along with the user prompt. This field supports String type input. This field is displayed if Image type is set to Public URL.
  • System prompt - The system prompt or context instruction for the completion request. This field supports String type input.
Note: The supported image file format depends on the selected model. As new models are released, more image formats will be available.
  • GPT-4o: PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif).
  • Gemini Flash: = PNG (.png), JPEG (.jpeg and .jpg).
  • Gemini Pro = PNG (.png), JPEG (.jpeg and .jpg).
Manage Properties

Use the Manage Properties wizard to configure or use any of the object's standard or custom fields. You can select fields to add them to the activity canvas. The added standard or custom fields are available in the Properties panel (in Studio Desktop) or under Show additional properties (in Studio Web).

Additional properties
  • Maximum tokens count - The maximum number of tokens to generate in the completion. The token count of your prompt plus those from the result/completion cannot exceed the value provided for this field. It's best to set this value to be less than the model's maximum count so as to have some room for the prompt token count. Default value is 1024. If not set, the activity defaults to the necessary tokens to accommodate the request or the maximum tokens allowed by the model. This field supports Int64 type input.
  • Temperature - The value of the creativity factor or sampling temperature to use. Higher values mean the model will take more risks. Try 0.9 for more creative responses or completions, or 0 (also called argmax sampling) for ones with a well-defined or more exact answer. The general recommendation is to alter, from the default value, this or the Nucleus Sample value, but not both values. Default value is 1.
  • Frequency penalty - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text, decreasing the model's likelihood to repeat the same line verbatim. Default value is 0.
  • Presence penalty - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Default value is 0.
  • Completion choices count - The number of completion choices to generate for the request. The higher the value of this field, the more the number of tokens that will get used. This results in a higher cost, so you need to be aware of that when setting the value of this field. Default value is 1.
  • Stop sequence - Up to four sequences where the API will stop generating further tokens. The returned text does not contain the stop sequence. Default value is null.
Output
  • Top generated text - The image analysis completion text.
  • Image analysis object - The response object contains the image analysis completion text, the model used, and other details.
  • Description
  • Project compatibility
  • Configuration

Was this page helpful?

Get The Help You Need
Learning RPA - Automation Courses
UiPath Community Forum
Uipath Logo White
Trust and Security
© 2005-2024 UiPath. All rights reserved.