Skip to content

Reusable Evaluator Definitions

Define evaluators in separate files to reuse them across multiple suites and keep configurations DRY.

Instead of defining evaluators inline in each suite, you can:

  1. Create evaluator definition files
  2. Reference them in your suites
  3. Share across multiple test cases
evaluators/test-coverage.yaml
name: agentic-judge:test-coverage
description: "Ensures code changes include appropriate test coverage"
config:
type: copilot-cli
agent_name: agentic-judge
assertions:
unit_tests: "New code has unit tests. Score 1 if comprehensive, 0.5 if basic, 0 if none."
edge_cases: "Edge cases are tested. Score 1 if covered, 0 if missing."
test_quality: "Tests are well-structured and maintainable. Score 0-1."
suite.yaml
evaluators:
# External evaluator definition
- file: ./evaluators/test-coverage.yaml
# Can mix with inline definitions
- name: git-diff
config:
assertions:
max_files_changed: 10
evaluators/
├── security.yaml
├── testing.yaml
├── documentation.yaml
├── code-quality.yaml
└── performance.yaml
evaluators/
├── frontend/
│ ├── react-best-practices.yaml
│ └── accessibility.yaml
├── backend/
│ ├── api-security.yaml
│ └── database-safety.yaml
└── shared/
├── error-handling.yaml
└── logging.yaml
evaluators/security.yaml
name: agentic-judge:security
description: "Evaluates security best practices in code changes"
config:
type: copilot-cli
model: claude-sonnet-4.5
prompt_file: ./prompts/security-expert.txt
assertions:
input_validation: |
All user inputs are validated before use.
Score 1 if properly validated with sanitization.
Score 0.5 if validated but not sanitized.
Score 0 if no validation.
no_injection: |
Code is protected against injection attacks (SQL, XSS, command).
Score 1 if parameterized queries/escaped output used.
Score 0 if raw input used in queries or output.
auth_check: |
Protected routes verify authentication.
Score 1 if all routes check auth.
Score 0 if any route is unprotected.
secrets_safe: |
No hardcoded secrets or credentials.
Score 1 if secrets from env vars.
Score 0 if hardcoded values found.
evaluators/documentation.yaml
name: agentic-judge:documentation
description: "Ensures code is properly documented"
config:
type: copilot-cli
assertions:
jsdoc_comments: |
All public functions have JSDoc comments.
Score 1 if all documented with @param and @returns.
Score 0.5 if documented but incomplete.
Score 0 if missing documentation.
readme_updated: |
README is updated for new features.
Score 1 if README documents new functionality.
Score 0.5 if README exists but not updated.
Score 0 if no README or severely outdated.
inline_comments: |
Complex logic has explanatory comments.
Score 1 if complex sections are explained.
Score 0 if confusing code lacks comments.
suite.yaml
repo: https://github.com/example/app.git
branch: main
agent:
type: copilot-cli
config:
prompt: "Add user authentication feature"
evaluators:
# Reusable evaluators
- file: ./evaluators/security.yaml
- file: ./evaluators/testing.yaml
- file: ./evaluators/documentation.yaml
# Task-specific inline evaluator
- name: git-diff
config:
assertions:
max_files_changed: 15

You can override settings from the file:

evaluators:
- file: ./evaluators/security.yaml
config:
model: gpt-5 # Override model from file
Terminal window
git submodule add https://github.com/org/shared-evaluators.git evaluators/shared
evaluators:
- file: ./evaluators/shared/security.yaml

Create a package with evaluator definitions:

package.json
{
"name": "@org/evaluators",
"files": ["evaluators/"]
}
evaluators:
- file: ./node_modules/@org/evaluators/security.yaml
  1. Descriptive names - Use agentic-judge:focus-area naming
  2. Clear descriptions - Document what the evaluator checks
  3. Multi-line assertions - Use YAML multi-line for readable assertions
  4. Version control - Track evaluator changes in git
  5. Centralize shared evaluators - Keep team standards in one place