UiPath Documentation
uipath-cli
latest
false

UiPath CLI user guide

Dernière mise à jour 7 mai 2026

uip agent eval

uip agent eval is the evaluation command group. It manages evaluators, evaluation sets, and test cases on disk, and executes evaluation runs against the Agent Runtime service. Four subcommand trees live underneath:

  • uip agent eval evaluator … — manage evaluators (evals/evaluators/*.json).
  • uip agent eval set … — manage evaluation sets (evals/eval-sets/*.json).
  • uip agent eval add | list | remove — manage test cases (evaluations) inside an evaluation set.
  • uip agent eval run … — start, monitor, list, and compare evaluation runs against the Agent Runtime service.

Everything except uip agent eval run * is local-only. The run subcommands require an active CLI session (uip login) and that the agent has already been pushed to Studio Web (via uip agent push) or that you pass --solution-id explicitly.

All uip agent eval subcommands honor the global options (--output, --output-filter, --log-level, --log-file). Exit codes follow the standard contract.

Synopsis

uip agent eval evaluator add    <name> --type <type> [--description <d>] [--prompt <p>] [--target-key <k>] [--path <dir>]
uip agent eval evaluator list                                                                                 [--path <dir>]
uip agent eval evaluator remove <id>                                                                           [--path <dir>]

uip agent eval set add    <name>  [--evaluators <ids>]                                                         [--path <dir>]
uip agent eval set list                                                                                         [--path <dir>]
uip agent eval set remove <id>                                                                                  [--path <dir>]

uip agent eval add    <name> --set <name> --inputs <json>
                              [--expected <json>] [--expected-agent-behavior <text>]
                              [--simulation-instructions <text>]
                              [--simulate-input] [--simulate-tools]
                              [--input-generation-instructions <text>]
                              [--path <dir>]
uip agent eval list   --set <name>                                                                              [--path <dir>]
uip agent eval remove <id> --set <name>                                                                         [--path <dir>]

uip agent eval run start   --set <name> [--solution-id <id>] [--wait] [--timeout <s>]                           [--path <dir>]
uip agent eval run status  <evalSetRunId> --set <name>                                                          [--path <dir>]
uip agent eval run results <evalSetRunId> --set <name> [--only-failed] [--verbose] [--export-format <json|csv>] [--path <dir>]
uip agent eval run list    --set <name>                                                                         [--path <dir>]
uip agent eval run compare <evalSetRunId> --compare-to <id> --set <name>                                        [--path <dir>]
uip agent eval evaluator add    <name> --type <type> [--description <d>] [--prompt <p>] [--target-key <k>] [--path <dir>]
uip agent eval evaluator list                                                                                 [--path <dir>]
uip agent eval evaluator remove <id>                                                                           [--path <dir>]

uip agent eval set add    <name>  [--evaluators <ids>]                                                         [--path <dir>]
uip agent eval set list                                                                                         [--path <dir>]
uip agent eval set remove <id>                                                                                  [--path <dir>]

uip agent eval add    <name> --set <name> --inputs <json>
                              [--expected <json>] [--expected-agent-behavior <text>]
                              [--simulation-instructions <text>]
                              [--simulate-input] [--simulate-tools]
                              [--input-generation-instructions <text>]
                              [--path <dir>]
uip agent eval list   --set <name>                                                                              [--path <dir>]
uip agent eval remove <id> --set <name>                                                                         [--path <dir>]

uip agent eval run start   --set <name> [--solution-id <id>] [--wait] [--timeout <s>]                           [--path <dir>]
uip agent eval run status  <evalSetRunId> --set <name>                                                          [--path <dir>]
uip agent eval run results <evalSetRunId> --set <name> [--only-failed] [--verbose] [--export-format <json|csv>] [--path <dir>]
uip agent eval run list    --set <name>                                                                         [--path <dir>]
uip agent eval run compare <evalSetRunId> --compare-to <id> --set <name>                                        [--path <dir>]

uip agent eval evaluator

Manage evaluators — the graders that score an agent's output.

eval evaluator add

Arguments:

  • <name> (required) — Evaluator name. Used as the default file name.

Options:

DrapeauDefaultRequisObjectif
--type <type>ouiEvaluator type. Known values include SemanticSimilarity, Trajectory, and LLM-as-judge; run uip agent eval evaluator add --help for the full list supported by your installation.
--description <desc>Free-text description.
--prompt <prompt>Custom evaluation prompt (for prompt-based evaluators).
--target-key <key>Target output key to evaluate against.
--path <path>.Path to the agent project directory.

Exemple :

uip agent eval evaluator add content-check \
  --type SemanticSimilarity \
  --path ./my-agent
uip agent eval evaluator add content-check \
  --type SemanticSimilarity \
  --path ./my-agent

Data shape (--output json):

{
  "Code": "AgentEvaluatorAdd",
  "Data": {
    "Status": "Evaluator added",
    "Name": "content-check",
    "Type": "SemanticSimilarity",
    "Id": "a1b2c3d4-0000-0000-0000-000000000130",
    "File": "content-check.json"
  }
}
{
  "Code": "AgentEvaluatorAdd",
  "Data": {
    "Status": "Evaluator added",
    "Name": "content-check",
    "Type": "SemanticSimilarity",
    "Id": "a1b2c3d4-0000-0000-0000-000000000130",
    "File": "content-check.json"
  }
}

eval evaluator list

Options: --path <path> (default .).

Data shape:

{
  "Code": "AgentEvaluatorList",
  "Data": [
    { "Name": "content-check", "Type": "SemanticSimilarity", "Id": "…", "File": "content-check.json" }
  ]
}
{
  "Code": "AgentEvaluatorList",
  "Data": [
    { "Name": "content-check", "Type": "SemanticSimilarity", "Id": "…", "File": "content-check.json" }
  ]
}

Empty projects return Data: { "Message": "No evaluators configured" }.

eval evaluator remove

Arguments: <id> — evaluator ID or name.

Options: --path <path> (default .).

Data shape:

{ "Code": "AgentEvaluatorRemove", "Data": { "Status": "Evaluator removed", "Id": "content-check" } }
{ "Code": "AgentEvaluatorRemove", "Data": { "Status": "Evaluator removed", "Id": "content-check" } }

uip agent eval set

Manage evaluation sets — named collections of test cases plus the evaluators that should score them.

eval set add

Arguments: <name> — evaluation-set name.

Options:

DrapeauDefaultObjectif
--evaluators <ids>all evaluators in the projectComma-separated evaluator IDs to include.
--path <path>.Path to the agent project directory.

Exemple :

uip agent eval set add smoke-tests \
  --evaluators a1b2c3d4-0000-0000-0000-000000000130,a1b2c3d4-0000-0000-0000-000000000131 \
  --path ./my-agent
uip agent eval set add smoke-tests \
  --evaluators a1b2c3d4-0000-0000-0000-000000000130,a1b2c3d4-0000-0000-0000-000000000131 \
  --path ./my-agent

Data shape:

{
  "Code": "AgentEvalSetAdd",
  "Data": {
    "Status": "Evaluation set created",
    "Name": "smoke-tests",
    "Id": "a1b2c3d4-0000-0000-0000-000000000110",
    "Evaluators": 2
  }
}
{
  "Code": "AgentEvalSetAdd",
  "Data": {
    "Status": "Evaluation set created",
    "Name": "smoke-tests",
    "Id": "a1b2c3d4-0000-0000-0000-000000000110",
    "Evaluators": 2
  }
}

eval set list

Options: --path <path> (default .).

Data shape:

{
  "Code": "AgentEvalSetList",
  "Data": [
    { "Name": "smoke-tests", "Id": "…", "Evaluations": 5, "Evaluators": 2 }
  ]
}
{
  "Code": "AgentEvalSetList",
  "Data": [
    { "Name": "smoke-tests", "Id": "…", "Evaluations": 5, "Evaluators": 2 }
  ]
}

eval set remove

Arguments: <id> — eval-set ID or name.

Options: --path <path> (default .).

uip agent eval add | list | remove (test cases)

Manage the test cases (evaluations) inside a set. These subcommands sit directly under eval, not under eval set.

eval add

Arguments: <name> — test-case name.

Options:

DrapeauDefaultRequisObjectif
--set <name>ouiEvaluation set name or ID.
--inputs <json>ouiInput values as a JSON string. Parsed; invalid JSON fails fast.
--expected <json>Expected output as JSON.
--expected-agent-behavior <text>Expected behaviour description for trajectory evaluators (for example, "Must call Web Search tool").
--simulation-instructions <text>Instructions for simulating agent behaviour during evaluation.
--simulate-inputoffEnable input simulation for this test case.
--simulate-toolsoffEnable tool simulation for this test case.
--input-generation-instructions <text>Instructions for synthesizing inputs.
--path <path>.Path to the agent project directory.

Exemple :

uip agent eval add simple-greeting \
  --set default \
  --inputs '{"input":"hello"}' \
  --expected '{"content":"world"}' \
  --path ./my-agent
uip agent eval add simple-greeting \
  --set default \
  --inputs '{"input":"hello"}' \
  --expected '{"content":"world"}' \
  --path ./my-agent

Data shape:

{
  "Code": "AgentEvalAdd",
  "Data": {
    "Status": "Evaluation added",
    "Name": "simple-greeting",
    "Id": "a1b2c3d4-0000-0000-0000-000000000120",
    "Set": "default"
  }
}
{
  "Code": "AgentEvalAdd",
  "Data": {
    "Status": "Evaluation added",
    "Name": "simple-greeting",
    "Id": "a1b2c3d4-0000-0000-0000-000000000120",
    "Set": "default"
  }
}

eval list

Options: --set <name> (required), --path <path> (default .).

Data shape:

{
  "Code": "AgentEvalList",
  "Data": [
    {
      "Name": "simple-greeting",
      "Id": "…",
      "Inputs": "{\"input\":\"hello\"}",
      "Expected": "{\"content\":\"world\"}",
      "ExpectedBehavior": "-"
    }
  ]
}
{
  "Code": "AgentEvalList",
  "Data": [
    {
      "Name": "simple-greeting",
      "Id": "…",
      "Inputs": "{\"input\":\"hello\"}",
      "Expected": "{\"content\":\"world\"}",
      "ExpectedBehavior": "-"
    }
  ]
}

eval remove

Arguments: <id> — evaluation ID or name.

Options: --set <name> (required), --path <path> (default .).

uip agent eval run

Execute, monitor, and compare evaluation runs via the Agent Runtime service (EvalsTenantExecutionApi). Requires uip login.

eval run start

Start an evaluation run. The agent must already be in Studio Web (uip agent push) — either pass --solution-id explicitly or rely on SolutionStorage.json, which push writes automatically.

Options:

DrapeauDefaultRequisObjectif
--set <name>ouiEvaluation set name or ID.
--solution-id <id>De SolutionStorage.jsonCloud solution ID. If omitted, the command reads SolutionStorage.json from the project; if neither is available, it errors out.
--path <path>.Path to the agent project directory.
--waitoffPoll until the run completes and then emit summary + per-test-case rows.
--timeout <seconds>600Maximum seconds to poll when --wait is set.

Exemple :

uip agent eval run start --set default --path ./my-agent --wait
uip agent eval run start --set default --path ./my-agent --wait

Data shape — kickoff (Code: "AgentEvalRunStarted"):

{
  "Code": "AgentEvalRunStarted",
  "Data": {
    "EvalSetRunId": "a1b2c3d4-0000-0000-0000-000000000101",
    "EvalSetName": "default",
    "TestCases": 5,
    "Evaluators": 2
  }
}
{
  "Code": "AgentEvalRunStarted",
  "Data": {
    "EvalSetRunId": "a1b2c3d4-0000-0000-0000-000000000101",
    "EvalSetName": "default",
    "TestCases": 5,
    "Evaluators": 2
  }
}

With --wait, two additional payloads follow after polling:

  • Code: "AgentEvalRunCompleted" — summary (Status, Score, Duration, EvaluatorScores, TestCases).
  • Code: "AgentEvalRunResults" — per-test-case rows (same shape as eval run results).

eval run status

Poll the status of an in-flight or finished run.

Arguments: <evalSetRunId> — run ID from eval run start.

Options: --set <name> (required), --path <path> (default .).

Data shape:

{
  "Code": "AgentEvalRunStatus",
  "Data": {
    "EvalSetRunId": "…",
    "Status": "completed",
    "Score": 0.86,
    "Duration": "42.5s",
    "EvaluatorScores": "semantic: 0.9, trajectory: 0.82"
  }
}
{
  "Code": "AgentEvalRunStatus",
  "Data": {
    "EvalSetRunId": "…",
    "Status": "completed",
    "Score": 0.86,
    "Duration": "42.5s",
    "EvaluatorScores": "semantic: 0.9, trajectory: 0.82"
  }
}

eval run results

Fetch per-test-case results.

Arguments: <evalSetRunId>.

Options:

DrapeauDefaultRequisObjectif
--set <name>ouiEvaluation set name or ID.
--path <path>.Path to the agent project directory.
--only-failedoffShow only failed or errored test cases.
--verboseoffInclude evaluator justifications in the output.
--export-format <json|csv>Write the formatted rows to eval-results-<timestamp>.(json|csv) instead of printing them.

Exemple :

uip agent eval run results <evalSetRunId> --set default --verbose --only-failed
uip agent eval run results <evalSetRunId> --set default --verbose --only-failed

Data shape (inline — no export):

{
  "Code": "AgentEvalRunResults",
  "Data": [
    {
      "TestCase": "simple-greeting",
      "Status": "completed",
      "Score": 1,
      "EvaluatorScores": "semantic: 0.95",
      "Tokens": 320,
      "Duration": "1.8s",
      "Error": "-"
    }
  ]
}
{
  "Code": "AgentEvalRunResults",
  "Data": [
    {
      "TestCase": "simple-greeting",
      "Status": "completed",
      "Score": 1,
      "EvaluatorScores": "semantic: 0.95",
      "Tokens": 320,
      "Duration": "1.8s",
      "Error": "-"
    }
  ]
}

When --export-format is set, the payload becomes Code: "AgentEvalRunExported" with Format, File, and Records.

eval run list

List all runs for a given eval set.

Options: --set <name> (required), --path <path> (default .).

Data shape:

{
  "Code": "AgentEvalRunList",
  "Data": [
    {
      "EvalSetRunId": "…",
      "Status": "completed",
      "Score": 0.86,
      "TestCases": 5,
      "Duration": "42.5s",
      "EvaluatorScores": "semantic: 0.9, trajectory: 0.82",
      "CreatedAt": "2025-04-15T10:30:00Z"
    }
  ]
}
{
  "Code": "AgentEvalRunList",
  "Data": [
    {
      "EvalSetRunId": "…",
      "Status": "completed",
      "Score": 0.86,
      "TestCases": 5,
      "Duration": "42.5s",
      "EvaluatorScores": "semantic: 0.9, trajectory: 0.82",
      "CreatedAt": "2025-04-15T10:30:00Z"
    }
  ]
}

eval run compare

Compare two runs side by side. Useful for A/B testing prompt or model changes.

Arguments: <evalSetRunId> — first (baseline) run ID.

Options:

DrapeauDefaultRequisObjectif
--compare-to <id>ouiSecond run ID to compare against.
--set <name>ouiEvaluation set name or ID.
--path <path>.Path to the agent project directory.

Data shape (Code: "AgentEvalRunComparison"):

{
  "Code": "AgentEvalRunComparison",
  "Data": {
    "RunA": { "Id": "…", "Score": 0.86, "Status": "completed" },
    "RunB": { "Id": "…", "Score": 0.80, "Status": "completed" },
    "ScoreDelta": 0.06,
    "TestCases": [
      { "TestCase": "simple-greeting", "ScoreA": 1, "ScoreB": 0.9, "Delta": "+0.1", "StatusA": "completed", "StatusB": "completed" }
    ]
  }
}
{
  "Code": "AgentEvalRunComparison",
  "Data": {
    "RunA": { "Id": "…", "Score": 0.86, "Status": "completed" },
    "RunB": { "Id": "…", "Score": 0.80, "Status": "completed" },
    "ScoreDelta": 0.06,
    "TestCases": [
      { "TestCase": "simple-greeting", "ScoreA": 1, "ScoreB": 0.9, "Delta": "+0.1", "StatusA": "completed", "StatusB": "completed" }
    ]
  }
}
  • uip agent push — must be run before eval run start (unless --solution-id is supplied).
  • uip agent validate — the default eval set and evaluators are created by init; validate keeps them consistent.
  • uip agent run — run the agent as an Orchestrator job; distinct from an Agent Runtime eval run.

Voir également

Cette page vous a-t-elle été utile ?

Connecter

Besoin d'aide ? Assistance

Vous souhaitez apprendre ? UiPath Academy

Vous avez des questions ? UiPath Forum

Rester à jour