Skip to content

Advanced Topics

Explore advanced features for power users and complex evaluation scenarios.

Model Selection

Choose and configure AI models for evaluations and judges. Learn more →

Workspace Management

Understand and manage evaluation workspaces and artifacts. Learn more →

Results & Reporting

Deep dive into result analysis and reporting options. Learn more →

Use ${VAR} syntax throughout your configuration:

repo: ${REPO_URL}
branch: ${BRANCH_NAME}
agent:
type: copilot-cli
model: ${MODEL}
config:
prompt: "Task for ${PROJECT_NAME}"
post_evaluation:
- name: webhook
config:
url: ${SLACK_WEBHOOK_URL}
workspace_dir: /tmp/youbencha-${BUILD_ID}

For long-running agent tasks:

timeout: 600000 # 10 minutes
agent:
type: copilot-cli
agent_name: specialized-coder

All evaluators run concurrently by default. Leverage this by using multiple focused evaluators:

evaluators:
# These all run in parallel
- name: git-diff
- name: agentic-judge-security
config: { ... }
- name: agentic-judge-testing
config: { ... }
- name: agentic-judge-quality
config: { ... }

Delete workspaces after successful runs:

Terminal window
yb run -c suite.yaml --delete-workspace

Or clean up manually:

Terminal window
rm -rf .youbencha-workspace/