Skip to content

Security

Security is a critical consideration when running AI agent evaluations. This guide covers youBencha’s security model and best practices.

youBencha implements multiple security layers:

  1. Repository validation - Blocks dangerous URLs
  2. Workspace isolation - Isolated execution environments
  3. Controlled execution - Managed command execution
  4. Credentials protection - Never hardcode secrets

youBencha blocks URLs that could enable SSRF attacks:

# ❌ Blocked
repo: http://localhost/repo.git
repo: http://127.0.0.1/repo.git
repo: http://192.168.1.100/repo.git
repo: http://10.0.0.1/repo.git
# ✅ Allowed
repo: https://github.com/org/repo.git
repo: https://gitlab.com/org/repo.git

Only secure protocols are allowed:

SchemeAllowed
https://✅ Yes
http://⚠️ Public only
file://❌ No
ssh://❌ No (for now)
  • Fresh clone for each evaluation
  • Separate directory per run
  • No cross-contamination between runs
  • Original repo unchanged

Default: .youbencha-workspace/run-{timestamp}-{hash}/

All operations are scoped to this directory:

.youbencha-workspace/
└── run-20241115-103000-abc123/
├── src-modified/ # Agent works here
├── src-expected/ # Reference only
└── artifacts/ # Output only

Scripts run with controlled environment:

pre_execution:
- name: script
config:
command: bash
args: ["-c", "echo $WORKSPACE_DIR"]
env:
CUSTOM_VAR: "value" # Explicit env vars only

Security measures:

  • shell: true for flexibility (controlled)
  • Only specified environment variables passed
  • Never inherits full process.env
  • Timeout enforcement

Post-evaluation hooks:

  • Run in parallel
  • Have read-only access to results
  • Cannot modify workspace
  • Never fail main evaluation
  1. Never hardcode secrets:
# ❌ Bad
post_evaluation:
- name: webhook
config:
url: https://hooks.slack.com/services/T00/B00/XXXX
# ✅ Good
post_evaluation:
- name: webhook
config:
url: ${SLACK_WEBHOOK_URL}
  1. Use environment variables:
Terminal window
export SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
export API_TOKEN="secret-token"
yb run -c suite.yaml
  1. CI/CD secrets:
.github/workflows/youbencha.yml
- name: Run Evaluation
run: yb run -c suite.yaml
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Use ${VAR} syntax throughout configuration:

repo: ${REPO_URL}
agent:
config:
prompt: "Token: ${API_TOKEN}"
post_evaluation:
- name: webhook
config:
url: ${WEBHOOK_URL}
headers:
Authorization: "Bearer ${AUTH_TOKEN}"

Minimum required scopes:

  • repo - Access repositories
  • read:user - Read user info (for Copilot)

For Copilot CLI:

  • Active Copilot subscription required
  • Token with copilot scope

youBencha makes these external connections:

PurposeDestination
Clone repoGitHub/GitLab
AI agentCopilot API
WebhooksYour endpoints

Ensure access to:

  • github.com (repos, API)
  • api.github.com
  • copilot-proxy.githubusercontent.com
  • Your webhook endpoints

Every evaluation logs:

  • Repository URL and branch
  • Agent configuration (prompts sanitized)
  • Evaluator results
  • Execution timing
  • Hook execution status

Configure result export for audit:

post_evaluation:
- name: database
config:
type: json-file
output_path: ./audit/evaluations.jsonl
include_full_bundle: true
append: true
  • Use HTTPS for all repositories
  • Store secrets in environment variables
  • Review pre-execution scripts for safety
  • Limit webhook URLs to trusted endpoints
  • Set appropriate timeouts
  • Clean up workspaces regularly
  • Audit evaluation logs periodically

Found a security vulnerability?

Email: [email protected]

Please include:

  • Description of the issue
  • Steps to reproduce
  • Potential impact