Model Selection
Choose and configure AI models for evaluations and judges. Learn more →
Explore advanced features for power users and complex evaluation scenarios.
Model Selection
Choose and configure AI models for evaluations and judges. Learn more →
Workspace Management
Understand and manage evaluation workspaces and artifacts. Learn more →
Results & Reporting
Deep dive into result analysis and reporting options. Learn more →
Use ${VAR} syntax throughout your configuration:
repo: ${REPO_URL}branch: ${BRANCH_NAME}
agent: type: copilot-cli model: ${MODEL} config: prompt: "Task for ${PROJECT_NAME}"
post_evaluation: - name: webhook config: url: ${SLACK_WEBHOOK_URL}workspace_dir: /tmp/youbencha-${BUILD_ID}For long-running agent tasks:
timeout: 600000 # 10 minutesagent: type: copilot-cli agent_name: specialized-coderAll evaluators run concurrently by default. Leverage this by using multiple focused evaluators:
evaluators: # These all run in parallel - name: git-diff - name: agentic-judge-security config: { ... } - name: agentic-judge-testing config: { ... } - name: agentic-judge-quality config: { ... }Delete workspaces after successful runs:
yb run -c suite.yaml --delete-workspaceOr clean up manually:
rm -rf .youbencha-workspace/