- 概述
- 开始使用
- 概念
- Using UiPath CLI
- 操作指南
- CI/CD recipes
- 命令参考
- 概述
- 退出代码
- Global options
- uip codedagent
- uip docsai
- add-test-data-entity
- add-test-data-queue
- add-test-data-variation
- analyze
- build
- 创建项目
- diff
- find-activities
- get-analyzer-rules
- get-default-activity-xaml
- get-errors
- get-manual-test-cases
- get-manual-test-steps
- get-versions
- get-workflow-example
- indicate-application
- indicate-element
- inspect-package
- install-data-fabric-entities
- install-or-update-packages
- list-data-fabric-entities
- list-workflow-examples
- pack
- restore
- run-file
- search-templates
- start-studio
- stop-execution
- uia
- uip traces
- 迁移
- Reference & support
UiPath CLI user guide
uip agent eval is the evaluation command group. It manages evaluators, evaluation sets, and test cases on disk, and executes evaluation runs against the Agent Runtime service. Four subcommand trees live underneath:
uip agent eval evaluator …— manage evaluators (evals/evaluators/*.json).uip agent eval set …— manage evaluation sets (evals/eval-sets/*.json).uip agent eval add | list | remove— manage test cases (evaluations) inside an evaluation set.uip agent eval run …— start, monitor, list, and compare evaluation runs against the Agent Runtime service.
Everything except uip agent eval run * is local-only. The run subcommands require an active CLI session (uip login) and that the agent has already been pushed to Studio Web (via uip agent push) or that you pass --solution-id explicitly.
All uip agent eval subcommands honor the global options (--output, --output-filter, --log-level, --log-file). Exit codes follow the standard contract.
Synopsis
uip agent eval evaluator add <name> --type <type> [--description <d>] [--prompt <p>] [--target-key <k>] [--path <dir>]
uip agent eval evaluator list [--path <dir>]
uip agent eval evaluator remove <id> [--path <dir>]
uip agent eval set add <name> [--evaluators <ids>] [--path <dir>]
uip agent eval set list [--path <dir>]
uip agent eval set remove <id> [--path <dir>]
uip agent eval add <name> --set <name> --inputs <json>
[--expected <json>] [--expected-agent-behavior <text>]
[--simulation-instructions <text>]
[--simulate-input] [--simulate-tools]
[--input-generation-instructions <text>]
[--path <dir>]
uip agent eval list --set <name> [--path <dir>]
uip agent eval remove <id> --set <name> [--path <dir>]
uip agent eval run start --set <name> [--solution-id <id>] [--wait] [--timeout <s>] [--path <dir>]
uip agent eval run status <evalSetRunId> --set <name> [--path <dir>]
uip agent eval run results <evalSetRunId> --set <name> [--only-failed] [--verbose] [--export-format <json|csv>] [--path <dir>]
uip agent eval run list --set <name> [--path <dir>]
uip agent eval run compare <evalSetRunId> --compare-to <id> --set <name> [--path <dir>]
uip agent eval evaluator add <name> --type <type> [--description <d>] [--prompt <p>] [--target-key <k>] [--path <dir>]
uip agent eval evaluator list [--path <dir>]
uip agent eval evaluator remove <id> [--path <dir>]
uip agent eval set add <name> [--evaluators <ids>] [--path <dir>]
uip agent eval set list [--path <dir>]
uip agent eval set remove <id> [--path <dir>]
uip agent eval add <name> --set <name> --inputs <json>
[--expected <json>] [--expected-agent-behavior <text>]
[--simulation-instructions <text>]
[--simulate-input] [--simulate-tools]
[--input-generation-instructions <text>]
[--path <dir>]
uip agent eval list --set <name> [--path <dir>]
uip agent eval remove <id> --set <name> [--path <dir>]
uip agent eval run start --set <name> [--solution-id <id>] [--wait] [--timeout <s>] [--path <dir>]
uip agent eval run status <evalSetRunId> --set <name> [--path <dir>]
uip agent eval run results <evalSetRunId> --set <name> [--only-failed] [--verbose] [--export-format <json|csv>] [--path <dir>]
uip agent eval run list --set <name> [--path <dir>]
uip agent eval run compare <evalSetRunId> --compare-to <id> --set <name> [--path <dir>]
uip agent eval evaluator
Manage evaluators — the graders that score an agent's output.
eval evaluator add
Arguments:
<name>(required) — Evaluator name. Used as the default file name.
Options:
| 标记 | 默认 | 必填 | 用途 |
|---|---|---|---|
--type <type> | — | 是 | Evaluator type. Known values include SemanticSimilarity, Trajectory, and LLM-as-judge; run uip agent eval evaluator add --help for the full list supported by your installation. |
--description <desc> | — | Free-text description. | |
--prompt <prompt> | — | Custom evaluation prompt (for prompt-based evaluators). | |
--target-key <key> | — | Target output key to evaluate against. | |
--path <path> | . | Path to the agent project directory. |
示例:
uip agent eval evaluator add content-check \
--type SemanticSimilarity \
--path ./my-agent
uip agent eval evaluator add content-check \
--type SemanticSimilarity \
--path ./my-agent
Data shape (--output json):
{
"Code": "AgentEvaluatorAdd",
"Data": {
"Status": "Evaluator added",
"Name": "content-check",
"Type": "SemanticSimilarity",
"Id": "a1b2c3d4-0000-0000-0000-000000000130",
"File": "content-check.json"
}
}
{
"Code": "AgentEvaluatorAdd",
"Data": {
"Status": "Evaluator added",
"Name": "content-check",
"Type": "SemanticSimilarity",
"Id": "a1b2c3d4-0000-0000-0000-000000000130",
"File": "content-check.json"
}
}
eval evaluator list
Options: --path <path> (default .).
Data shape:
{
"Code": "AgentEvaluatorList",
"Data": [
{ "Name": "content-check", "Type": "SemanticSimilarity", "Id": "…", "File": "content-check.json" }
]
}
{
"Code": "AgentEvaluatorList",
"Data": [
{ "Name": "content-check", "Type": "SemanticSimilarity", "Id": "…", "File": "content-check.json" }
]
}
Empty projects return Data: { "Message": "No evaluators configured" }.
eval evaluator remove
Arguments: <id> — evaluator ID or name.
Options: --path <path> (default .).
Data shape:
{ "Code": "AgentEvaluatorRemove", "Data": { "Status": "Evaluator removed", "Id": "content-check" } }
{ "Code": "AgentEvaluatorRemove", "Data": { "Status": "Evaluator removed", "Id": "content-check" } }
uip agent eval set
Manage evaluation sets — named collections of test cases plus the evaluators that should score them.
eval set add
Arguments: <name> — evaluation-set name.
Options:
| 标记 | 默认 | 用途 |
|---|---|---|
--evaluators <ids> | all evaluators in the project | Comma-separated evaluator IDs to include. |
--path <path> | . | Path to the agent project directory. |
示例:
uip agent eval set add smoke-tests \
--evaluators a1b2c3d4-0000-0000-0000-000000000130,a1b2c3d4-0000-0000-0000-000000000131 \
--path ./my-agent
uip agent eval set add smoke-tests \
--evaluators a1b2c3d4-0000-0000-0000-000000000130,a1b2c3d4-0000-0000-0000-000000000131 \
--path ./my-agent
Data shape:
{
"Code": "AgentEvalSetAdd",
"Data": {
"Status": "Evaluation set created",
"Name": "smoke-tests",
"Id": "a1b2c3d4-0000-0000-0000-000000000110",
"Evaluators": 2
}
}
{
"Code": "AgentEvalSetAdd",
"Data": {
"Status": "Evaluation set created",
"Name": "smoke-tests",
"Id": "a1b2c3d4-0000-0000-0000-000000000110",
"Evaluators": 2
}
}
eval set list
Options: --path <path> (default .).
Data shape:
{
"Code": "AgentEvalSetList",
"Data": [
{ "Name": "smoke-tests", "Id": "…", "Evaluations": 5, "Evaluators": 2 }
]
}
{
"Code": "AgentEvalSetList",
"Data": [
{ "Name": "smoke-tests", "Id": "…", "Evaluations": 5, "Evaluators": 2 }
]
}
eval set remove
Arguments: <id> — eval-set ID or name.
Options: --path <path> (default .).
uip agent eval add | list | remove (test cases)
Manage the test cases (evaluations) inside a set. These subcommands sit directly under eval, not under eval set.
eval add
Arguments: <name> — test-case name.
Options:
| 标记 | 默认 | 必填 | 用途 |
|---|---|---|---|
--set <name> | — | 是 | Evaluation set name or ID. |
--inputs <json> | — | 是 | Input values as a JSON string. Parsed; invalid JSON fails fast. |
--expected <json> | — | Expected output as JSON. | |
--expected-agent-behavior <text> | — | Expected behaviour description for trajectory evaluators (for example, "Must call Web Search tool"). | |
--simulation-instructions <text> | — | Instructions for simulating agent behaviour during evaluation. | |
--simulate-input | off | Enable input simulation for this test case. | |
--simulate-tools | off | Enable tool simulation for this test case. | |
--input-generation-instructions <text> | — | Instructions for synthesizing inputs. | |
--path <path> | . | Path to the agent project directory. |
示例:
uip agent eval add simple-greeting \
--set default \
--inputs '{"input":"hello"}' \
--expected '{"content":"world"}' \
--path ./my-agent
uip agent eval add simple-greeting \
--set default \
--inputs '{"input":"hello"}' \
--expected '{"content":"world"}' \
--path ./my-agent
Data shape:
{
"Code": "AgentEvalAdd",
"Data": {
"Status": "Evaluation added",
"Name": "simple-greeting",
"Id": "a1b2c3d4-0000-0000-0000-000000000120",
"Set": "default"
}
}
{
"Code": "AgentEvalAdd",
"Data": {
"Status": "Evaluation added",
"Name": "simple-greeting",
"Id": "a1b2c3d4-0000-0000-0000-000000000120",
"Set": "default"
}
}
eval list
Options: --set <name> (required), --path <path> (default .).
Data shape:
{
"Code": "AgentEvalList",
"Data": [
{
"Name": "simple-greeting",
"Id": "…",
"Inputs": "{\"input\":\"hello\"}",
"Expected": "{\"content\":\"world\"}",
"ExpectedBehavior": "-"
}
]
}
{
"Code": "AgentEvalList",
"Data": [
{
"Name": "simple-greeting",
"Id": "…",
"Inputs": "{\"input\":\"hello\"}",
"Expected": "{\"content\":\"world\"}",
"ExpectedBehavior": "-"
}
]
}
eval remove
Arguments: <id> — evaluation ID or name.
Options: --set <name> (required), --path <path> (default .).
uip agent eval run
Execute, monitor, and compare evaluation runs via the Agent Runtime service (EvalsTenantExecutionApi). Requires uip login.
eval run start
Start an evaluation run. The agent must already be in Studio Web (uip agent push) — either pass --solution-id explicitly or rely on SolutionStorage.json, which push writes automatically.
Options:
| 标记 | 默认 | 必填 | 用途 |
|---|---|---|---|
--set <name> | — | 是 | Evaluation set name or ID. |
--solution-id <id> | 发件人 SolutionStorage.json | Cloud solution ID. If omitted, the command reads SolutionStorage.json from the project; if neither is available, it errors out. | |
--path <path> | . | Path to the agent project directory. | |
--wait | off | Poll until the run completes and then emit summary + per-test-case rows. | |
--timeout <seconds> | 600 | Maximum seconds to poll when --wait is set. |
示例:
uip agent eval run start --set default --path ./my-agent --wait
uip agent eval run start --set default --path ./my-agent --wait
Data shape — kickoff (Code: "AgentEvalRunStarted"):
{
"Code": "AgentEvalRunStarted",
"Data": {
"EvalSetRunId": "a1b2c3d4-0000-0000-0000-000000000101",
"EvalSetName": "default",
"TestCases": 5,
"Evaluators": 2
}
}
{
"Code": "AgentEvalRunStarted",
"Data": {
"EvalSetRunId": "a1b2c3d4-0000-0000-0000-000000000101",
"EvalSetName": "default",
"TestCases": 5,
"Evaluators": 2
}
}
With --wait, two additional payloads follow after polling:
Code: "AgentEvalRunCompleted"— summary (Status,Score,Duration,EvaluatorScores,TestCases).Code: "AgentEvalRunResults"— per-test-case rows (same shape aseval run results).
eval run status
Poll the status of an in-flight or finished run.
Arguments: <evalSetRunId> — run ID from eval run start.
Options: --set <name> (required), --path <path> (default .).
Data shape:
{
"Code": "AgentEvalRunStatus",
"Data": {
"EvalSetRunId": "…",
"Status": "completed",
"Score": 0.86,
"Duration": "42.5s",
"EvaluatorScores": "semantic: 0.9, trajectory: 0.82"
}
}
{
"Code": "AgentEvalRunStatus",
"Data": {
"EvalSetRunId": "…",
"Status": "completed",
"Score": 0.86,
"Duration": "42.5s",
"EvaluatorScores": "semantic: 0.9, trajectory: 0.82"
}
}
eval run results
Fetch per-test-case results.
Arguments: <evalSetRunId>.
Options:
| 标记 | 默认 | 必填 | 用途 |
|---|---|---|---|
--set <name> | — | 是 | Evaluation set name or ID. |
--path <path> | . | Path to the agent project directory. | |
--only-failed | off | Show only failed or errored test cases. | |
--verbose | off | Include evaluator justifications in the output. | |
--export-format <json|csv> | — | Write the formatted rows to eval-results-<timestamp>.(json|csv) instead of printing them. |
示例:
uip agent eval run results <evalSetRunId> --set default --verbose --only-failed
uip agent eval run results <evalSetRunId> --set default --verbose --only-failed
Data shape (inline — no export):
{
"Code": "AgentEvalRunResults",
"Data": [
{
"TestCase": "simple-greeting",
"Status": "completed",
"Score": 1,
"EvaluatorScores": "semantic: 0.95",
"Tokens": 320,
"Duration": "1.8s",
"Error": "-"
}
]
}
{
"Code": "AgentEvalRunResults",
"Data": [
{
"TestCase": "simple-greeting",
"Status": "completed",
"Score": 1,
"EvaluatorScores": "semantic: 0.95",
"Tokens": 320,
"Duration": "1.8s",
"Error": "-"
}
]
}
When --export-format is set, the payload becomes Code: "AgentEvalRunExported" with Format, File, and Records.
eval run list
List all runs for a given eval set.
Options: --set <name> (required), --path <path> (default .).
Data shape:
{
"Code": "AgentEvalRunList",
"Data": [
{
"EvalSetRunId": "…",
"Status": "completed",
"Score": 0.86,
"TestCases": 5,
"Duration": "42.5s",
"EvaluatorScores": "semantic: 0.9, trajectory: 0.82",
"CreatedAt": "2025-04-15T10:30:00Z"
}
]
}
{
"Code": "AgentEvalRunList",
"Data": [
{
"EvalSetRunId": "…",
"Status": "completed",
"Score": 0.86,
"TestCases": 5,
"Duration": "42.5s",
"EvaluatorScores": "semantic: 0.9, trajectory: 0.82",
"CreatedAt": "2025-04-15T10:30:00Z"
}
]
}
eval run compare
Compare two runs side by side. Useful for A/B testing prompt or model changes.
Arguments: <evalSetRunId> — first (baseline) run ID.
Options:
| 标记 | 默认 | 必填 | 用途 |
|---|---|---|---|
--compare-to <id> | — | 是 | Second run ID to compare against. |
--set <name> | — | 是 | Evaluation set name or ID. |
--path <path> | . | Path to the agent project directory. |
Data shape (Code: "AgentEvalRunComparison"):
{
"Code": "AgentEvalRunComparison",
"Data": {
"RunA": { "Id": "…", "Score": 0.86, "Status": "completed" },
"RunB": { "Id": "…", "Score": 0.80, "Status": "completed" },
"ScoreDelta": 0.06,
"TestCases": [
{ "TestCase": "simple-greeting", "ScoreA": 1, "ScoreB": 0.9, "Delta": "+0.1", "StatusA": "completed", "StatusB": "completed" }
]
}
}
{
"Code": "AgentEvalRunComparison",
"Data": {
"RunA": { "Id": "…", "Score": 0.86, "Status": "completed" },
"RunB": { "Id": "…", "Score": 0.80, "Status": "completed" },
"ScoreDelta": 0.06,
"TestCases": [
{ "TestCase": "simple-greeting", "ScoreA": 1, "ScoreB": 0.9, "Delta": "+0.1", "StatusA": "completed", "StatusB": "completed" }
]
}
}
Related
uip agent push— must be run beforeeval run start(unless--solution-idis supplied).uip agent validate— the default eval set and evaluators are created byinit;validatekeeps them consistent.uip agent run— run the agent as an Orchestrator job; distinct from an Agent Runtime eval run.
另请参阅
- Authentication — sessions and token validity for the
eval runsubcommands. - Global options, Exit codes.
- Synopsis
- uip agent eval evaluator
- eval evaluator add
- eval evaluator list
- eval evaluator remove
- uip agent eval set
- eval set add
- eval set list
- eval set remove
- uip agent eval add | list | remove (test cases)
- eval add
- eval list
- eval remove
- uip agent eval run
- eval run start
- eval run status
- eval run results
- eval run list
- eval run compare
- Related
- 另请参阅