agents
latest
false
Importante :
La localización de contenidos recién publicados puede tardar entre una y dos semanas en estar disponible.
UiPath logo, featuring letters U and I in white

Guía del usuario de Agents

Última actualización 19 de feb. de 2026

Evaluación de agentes conversacionales

Evaluations help ensure your conversational agent behaves reliably across varied dialogue paths. This page covers how to test your agent using Debug chat, create evaluation sets, and run automated tests.

Debug chat

Debug chat provides a real-time testing environment where you can interact with your agent and inspect its behavior.

Starting a debug session

  1. In Studio Web, open your conversational agent.
  2. Select Debug to open the chat interface.
  3. Send messages to test your agent's responses.

Depurar interfaz de chat

Viewing execution traces

The history panel shows real-time details of the agent's execution:

  • LLM calls: The prompts sent to the model and responses received.
  • Tool calls: Which tools were invoked, with arguments and outputs.

Expand any step to see full details, including token counts and latency.

Execution trace panel

Viewing citations

When your agent uses Context Grounding, citations appear in the response showing which documents informed the answer.

  1. Look for citation markers in the agent's response (typically numbered references).
  2. Select a citation to see the source document and relevant excerpt.
  3. Verify that citations accurately support the agent's response.

Citation view

Adding conversations to evaluation sets

After a successful test interaction, save it for automated testing:

  1. In the Chat tab, select Add to evaluation set.
  2. Choose an existing evaluation set or create a new one.

The conversation is saved with:

  • Conversation history: All preceding turns in the dialogue.
  • Current user message: The user's latest input.
  • Expected agent response: The agent's actual response (which you can edit).

Conjuntos de evaluación

Evaluation sets are collections of test cases that validate your agent's behavior. They support both single-turn and multi-turn testing scenarios.

For detailed evaluation guidance, refer to Agent evaluations

Single-turn evaluations

Single-turn evaluations test isolated question-and-answer pairs without conversation history. They are evaluation tests where you test the first prompt in a conversation.

Use single-turn evaluations for:

  • Testing specific knowledge retrieval.
  • Validating tool selection for different intents.
  • Checking response format and tone.

Ejemplo:

User messageComportamiento esperado
"How many holidays do we have in the US?"Returns correct count, cites policy document
"Schedule a meeting with John tomorrow at 2pm"Calls calendar tool with correct parameters

Multi-turn evaluations

Multi-turn evaluations test how the agent handles conversation context and follow-up questions. They are evaluation tests where the tested prompt follows previous conversation.

Use multi-turn evaluations for:

  • Testing context retention across turns.
  • Validating pronoun resolution ("it", "that", "the same").
  • Checking conversation flow and coherence.

Ejemplo:

TurnMensajeComportamiento esperado
1"What's the PTO policy?"Returns PTO policy summary
2"How do I request time off?"References PTO context, explains request process
3"Can I do that through email?"Understands "that" refers to requesting time off

Creating evaluation tests

From Debug chat
  1. Run a conversation in Debug chat.
  2. Select Add to evaluation set from the Chat panel.
  3. The conversation exchange will be added as an evaluation test in your designated evaluation set.
Using the Conversation builder

The Conversation builder lets you create or edit multi-turn test cases:

  1. Select Evaluation Sets for your agent in Studio Web.
  2. Select an evaluation set or create a new one. If these options are disabled, make sure you aren't in debug mode.
  3. Select Add to set or edit an existing test.
  4. Use the Conversation builder to:
    • Add conversation history turns.
    • Define the current user message.
  5. Use Output setup to define the assertion
    • Specify the expected agent response for deterministic and LLM-as-a-judge based evaluators.
    • Specify the "behavior and output notes" for trajectory based evaluators.

Conversation Builder

Tool simulations

Simulations let you test agent behavior without executing real tool endpoints. For each evaluation test, you can specify whether tools should actually execute or simulate their execution.

Simulations enhance agent evaluations by enabling:

  • Safe testing: Avoid unintended side effects from calling real APIs or services.
  • Faster execution: Skip network latency and external service delays.
  • Cost-effective runs: Reduce API costs during iterative testing.
  • Reproducibility: Get consistent results by controlling tool outputs.

You can configure simulation behavior for each evaluation test:

  1. Open an evaluation set.
  2. Select a test case to edit.
  3. In the test configuration, specify which tools should simulate execution.
  4. Define the expected simulated output for each tool.
Generating tests with natural language

Use Autopilot to generate evaluation tests from descriptions:

  1. In the Evaluation Sets screen, select Create then Generate new evaluation set.
  2. Describe the scenarios you want to test in natural language.
  3. Review and refine the generated test cases.

Solicitud de ejemplo:

Generate test cases for an HR assistant that:
- Answers questions about vacation policy
- Handles requests to schedule meetings
- Escalates when asked about salary information
- Responds appropriately when the user is frustrated
Generate test cases for an HR assistant that:
- Answers questions about vacation policy
- Handles requests to schedule meetings
- Escalates when asked about salary information
- Responds appropriately when the user is frustrated
Nota:

Autopilot generated evaluation tests automatically use trajectory-based evaluations.

Generate tests dialog

Running evaluations

Running a single test

  1. Select a test case from your evaluation set.
  2. Select Evaluate selected.
  3. Review the results, comparing actual output to expected output.

Running batch evaluations

  1. Go to Evaluation sets.
  2. Select Run on the desired evaluation set to execute all tests.
  3. Review the results showing pass/fail rates.

Evaluation results

Testing with different models

Run the same evaluation set against different models to compare performance:

  1. In the evaluation set, select Evaluation Settings to add an additional target model.
  2. Run the evaluation.
  3. Compare results across models to identify the best fit for your use case.

This helps you understand:

  • Which models perform best for your specific scenarios.
  • Trade-offs between response quality and latency.
  • Cost implications of different model choices.

Evaluation metrics

Evaluations assess multiple dimensions of agent behavior:

MétricaDescripción
Response accuracyDoes the response contain correct information?
Tool selectionDid the agent choose the appropriate tool?
Citation qualityAre citations relevant and accurate?
Tone and formatDoes the response match expected style?
Context retentionDoes the agent maintain context across turns?

Evaluation best practices

Test both happy and unhappy paths

Don't just test ideal scenarios. Include:

  • Ambiguous questions
  • Out-of-scope requests
  • Edge cases and error conditions
  • Multi-language inputs (if supported)

Create representative test suites

Build evaluation sets that reflect real usage patterns:

  • Analyze common user queries from production
  • Include variations of the same question
  • Test different user personas and communication styles

Iterate based on results

Use evaluation failures to improve your agent:

  1. Identify patterns in failed tests.
  2. Update system prompts or tool configurations.
  3. Re-run evaluations to verify improvements.
  4. Add new tests for discovered edge cases.

Próximos pasos

¿Te ha resultado útil esta página?

Obtén la ayuda que necesitas
RPA para el aprendizaje - Cursos de automatización
Foro de la comunidad UiPath
Uipath Logo
Confianza y seguridad
© 2005-2026 UiPath. Todos los derechos reservados.