Best practices - Maestro

Production-ready patterns and quality standards for expert Maestro users.

Code Quality Standards

Pre-PR Quality Checklist

Before creating pull requests, complete these universal requirements:

Always Required (WIP or Production)

Code cleanup:

Remove throwaway debugging code (temporary print/console.log)
Delete commented-out code blocks
Remove experimental/test code not meant for commit
Clean up unused imports, variables, functions

Note: Structured logging (debug, info, warn, error levels) is professional practice, not cruft. Only remove temporary debugging statements. Documentation:

Update README if structure or usage changed
Document WIP status clearly if submitting work-in-progress
Add comments for non-obvious logic
Update relevant docs for completed portions

For Production-Ready PRs (Additionally)

Full test suite passes (no failures tolerated)
No critical TODO/FIXME unresolved
No placeholder implementations
Dependencies documented
No hardcoded values needing configuration
Code coverage meets project standards (typically >80%)

Prohibited in ALL PRs

Throwaway debugging code
Dead code accumulation
Undocumented WIP state
Commented-out code blocks
Skipped or disabled tests

Testing Standards

Minimum requirements:

Unit tests:
- All public functions/methods tested
- Edge cases covered
- Error scenarios handled
- Mock external dependencies

Integration tests:
- Component interactions verified
- End-to-end flows tested
- External service integration validated

Performance tests:
- Benchmarks for critical paths
- Load testing where relevant
- Regression prevention

Coverage targets:

Critical code: 100% coverage
Business logic: >90% coverage
Overall project: >80% coverage

Test quality:

Tests are deterministic (no flaky tests)
Tests are independent (order doesn’t matter)
Tests are fast (quick feedback loops)
Tests are clear (obvious what they validate)

Code Review Standards

Before requesting review:

Self-review all changes
Run full test suite
Check code coverage
Verify documentation updated
Ensure clean commit history
Test manually if UI/UX involved

Review criteria:

Code correctness
Test coverage
Performance implications
Security considerations
Maintainability
Documentation quality

Validation Methodology

Empirical Validation Required

Never assume code works without running it: Wrong:

Maestro: "I've implemented the feature. It should work."
You: "Great, let's move on"

Right:

Maestro: "I've implemented the feature"
You: "Run the full test suite and show me the results"
Maestro: [Runs tests, shows output]
You: [Reviews output] "All tests pass. Show me code coverage"
Maestro: [Shows coverage report]
You: "Coverage is 92%. Run performance benchmarks"
Maestro: [Runs benchmarks, shows results]
You: "Performance meets targets. Now it's complete."

Running Full Test Suites

Always run complete test suite, not targeted tests: Insufficient:

"Run the authentication tests only"

Required:

"Run the FULL test suite. Show complete output including:
- Number of tests run
- Pass/fail breakdown
- Code coverage percentage
- Execution time
- Any warnings or deprecation notices"

Exception: Skip only if no test infrastructure exists AND you haven’t requested tests. Reason: Partial testing misses regressions in seemingly unrelated code.

Benchmark-Driven Development

For performance-critical features:

Establish baseline:

"Profile current implementation with realistic workload.
Document baseline metrics:
- Latency (p50, p90, p99)
- Throughput (requests/second)
- Resource usage (CPU, memory)
- Create reproducible test harness"

Implement changes

Measure improvement:

"Run benchmarks using SAME test harness.
Compare to baseline:
- Show both old and new metrics
- Calculate improvement percentage
- Verify no regressions in other areas
- Explain any unexpected results"

Standard: Optimization without measurement is speculation.

Session Management Best Practices

Capacity Management

Proactive approach:

Every 20-30 turns:
- Check capacity indicator
- Consider /refresh if many file iterations
- Use /compact if capacity >70%

Before major validation:
- /refresh to clean file context
- Ensure Maestro sees current state only

After major milestone:
- /synopsis to document learnings
- Consider /compact to free capacity
- /download-changed for backup

Reactive approach (only when warned):

Capacity warning appears:
- /refresh immediately
- /compact if still high
- /forget if necessary

Last resort:
- /download-all
- Create new session
- Upload critical files
- Resume with clean capacity

File Management

Iteration discipline:

Use /refresh regularly to avoid:
- Hundreds of iterations consuming capacity
- Maestro viewing obsolete code
- Confusion about "current" state

Exception: Keep iterations when:
- Comparing approaches
- Documenting evolution
- May need to restore earlier version

Selective viewing:

Don't activate all files at once

Start with:
- View high-level structure (README, config)
- View specific files needed for current task

Expand as needed:
- View related files when referenced
- Activate dependencies when modifying code

Memory Management

What to keep:

Architectural decisions and rationale
Specifications and requirements
Validation results and benchmarks
Key lessons and insights

What to forget/compact:

Failed approaches (after learning from them)
Debugging iterations (after fix implemented)
Exploratory analysis (after conclusion)
Redundant explanations

Timing:

After major milestone:
/synopsis → Document state
/compact → Compress implementation details
Keep: Synopsis + current validated state

Collaboration Patterns

Code Review Workflow

Reviewing Maestro’s PRs:

1. Read PR description thoroughly
2. Review all changed files
3. Check test additions/changes
4. Run tests yourself (verify claims)
5. Check for:
   - Security issues
   - Performance concerns
   - Maintainability problems
   - Missing edge cases

6. Request changes if needed
7. Approve only when satisfied

Reviewing teammate’s code with Maestro:

"Clone PR #123 and review for:
- Code quality
- Test coverage
- Performance implications
- Security vulnerabilities
- Best practice adherence

Provide detailed feedback on issues found."

Team Workflows

Feature ownership:

Clear assignment:
- Developer A: Authentication (Session 1)
- Developer B: Payment processing (Session 2)
- Developer C: Notifications (Session 3)

Integration session:
- Clone all three PRs
- Verify compatibility
- Integration testing

Handoff protocol:

When handing off work:
/synopsis to document state
Clear commit message on last change
Brief handoff document
/download-changed for backup

Recipient:
Clone repository at handoff point
Read synopsis
Verify understanding
Continue or ask clarifying questions

Security Best Practices

Input Validation

Ensure Maestro implements:

For all user inputs:
- Type validation
- Range/length limits
- Sanitization
- SQL injection prevention
- XSS prevention

Explicitly request:
"Implement strict input validation for all endpoints. Include tests that attempt injection attacks and verify they're blocked."

Dependency Management

Audit dependencies:

"Review all dependencies for:
- Known vulnerabilities
- Maintenance status
- License compatibility
- Size and complexity"

Tools: Use npm audit, pip-audit, etc.

Update regularly:

"Update dependencies to latest stable versions.

Process:
1. Update dependency files
2. Run full test suite
3. Verify no breaking changes
4. Document any required code changes"

Secret Management

Never commit secrets:

Validation before every PR:
"Scan all files for potential secrets:
- API keys
- Passwords
- Connection strings
- Private keys

Report any found. Ensure .gitignore covers them."

Use environment variables:

Require:
- All secrets via environment variables
- No hardcoded credentials
- Configuration separate from code
- Document required env vars in README

Performance Best Practices

Optimization Workflow

Systematic approach:

Profile first (never guess)
Identify actual bottleneck
Estimate optimization impact
Implement change
Benchmark with same conditions
Verify improvement
Check for regressions

Document: Baseline → Change → Result

Optimization priorities:

Fix in order:
Algorithmic inefficiency (O(n²) → O(n log n))
I/O bottlenecks (database, network)
CPU-bound hotspots
Memory usage
Micro-optimizations (last resort)

Database Performance

Query optimization:

Process:
Enable query logging
Identify slow queries (>100ms)
Analyze execution plans
Add indexes where beneficial
Optimize query structure
Benchmark before/after

Connection pooling:

Always use connection pools:
- PostgreSQL: pgbouncer or app-level pooling
- MySQL: connection pooling enabled
- MongoDB: connection pool configured

Test pool behavior under load

Caching Strategies

Layered caching:

Implement:
1. Application-level (in-memory)
2. Distributed (Redis/Memcached)
3. CDN (for static assets)
4. Database query cache

Validate cache effectiveness:
- Measure hit rates
- Verify invalidation works
- Test stale data scenarios

Documentation Best Practices

Code Documentation

Inline comments:

Document:
- Why, not what (code shows what)
- Non-obvious logic
- Performance trade-offs
- Security considerations
- Future improvement notes

API documentation:

For all public APIs:
- Purpose and use cases
- Parameters with types and constraints
- Return values with types
- Exceptions that may be thrown
- Usage examples
- Performance characteristics

README requirements:

Must include:
- Project purpose
- Installation instructions
- Configuration guide
- Usage examples
- Development setup
- Testing instructions
- Deployment guide
- License information

Architectural Documentation

For complex systems:

Create:
- Architecture diagrams (use Mermaid tool)
- Component interaction diagrams
- Data flow diagrams
- Deployment architecture

Update:
- When architecture changes
- When components added/removed
- When interactions change

Reliability and Robustness

Error Handling

Comprehensive error handling:

For all operations that can fail:
- Catch specific exceptions
- Log errors appropriately
- Provide useful error messages
- Graceful degradation
- Retry with exponential backoff (where appropriate)

Test error scenarios:
- Network failures
- Database unavailability
- Invalid inputs
- Resource exhaustion
- Timeout conditions

Graceful Degradation

Design for partial failure:

If caching layer fails:
- System continues without cache
- Performance degrades but functionality intact
- Errors logged for monitoring
- Recovery automatic when cache available

Circuit breaker pattern:

For external services:
- Detect sustained failures
- Stop calling failing service temporarily
- Return cached data or default responses
- Retry after cooldown period
- Resume when service healthy

Monitoring and Observability

Instrumentation:

Add to all production code:
- Structured logging (JSON format)
- Metrics emission (request rates, latencies, errors)
- Distributed tracing (for microservices)
- Health check endpoints

Make observable:
- Request/response cycles
- Database query performance
- External API calls
- Error rates and types
- Resource utilization

Production Deployment

Pre-Deployment Checklist

Before deploying to production:
☑ All tests pass (100%)
☑ Code coverage meets standards
☑ Performance benchmarks acceptable
☑ Security scan passed
☑ Documentation complete
☑ Rollback plan documented
☑ Monitoring configured
☑ Secrets in environment (not code)
☑ Configuration for each environment
☑ Database migrations tested
☑ Dependency versions locked
☑ Health check endpoint implemented

Deployment Validation

Staged deployment:

Never deploy directly to production

Workflow:
Deploy to staging environment
Run smoke tests
Perform manual validation
Monitor for errors (24 hours)
If stable, deploy to production
Monitor closely post-deployment

Rollback preparedness:

Before deployment:
- Document rollback procedure
- Test rollback in staging
- Have rollback trigger criteria
- Know rollback time estimate

If issues detected:
- Rollback immediately
- Investigate in non-production
- Fix and redeploy

Anti-Patterns to Avoid

Development Anti-Patterns

Testing after implementation

Bad: Implement feature → Create tests as afterthought
Good: Design tests → Implement to satisfy tests

Assuming tests are always correct

Bad: Test fails → Change code to make test pass (without understanding)
Good: Test fails → Understand why → Fix implementation OR fix test (whichever is wrong)

Skipping validation “to save time”

Bad: Feature done → Skip tests → Create PR → Hope for best
Good: Feature done → Full validation → Proven working → Create PR

Accepting claims without evidence

Bad: Maestro says "Performance improved" → Accept
Good: Maestro says "Performance improved" → Demand benchmarks

Overcomplicating simple tasks

Bad: Use Maestro for changing button color
Good: Use IDE for trivial changes, Maestro for substantial features

Session Management Anti-Patterns

Monster sessions

Bad: Single session doing everything for 200 turns
Good: Focused sessions with clear boundaries

Ignoring capacity warnings

Bad: Let capacity hit 100%, then struggle
Good: Proactive management at 70%

Not using checkpoints

Bad: Long session with no backups
Good: Regular checkpoints and /synopsis

Communication Anti-Patterns

Vague requirements

Bad: "Make it better"
Good: "Improve performance. Target: <100ms latency. Current: 500ms. Benchmark with same test harness."

Assuming Maestro remembers everything

Bad: "Use the database credentials" (mentioned 50 turns ago)
Good: "Use DATABASE_URL=postgresql://..." (explicit)

Not correcting misunderstandings

Bad: Let Maestro continue with wrong assumption
Good: Immediately correct: "Stop. That's wrong. Here's the correct understanding: ..."

Expert Workflows

Rapid Prototyping to Production

Day 1: Prototype

Session 1: Quick prototype
- Core functionality only
- Minimal error handling
- Basic tests
- Validate concept

Result: Proven approach

Day 2-3: Production Implementation

Session 2: Production version
- Comprehensive implementation
- Full error handling
- Complete test coverage
- Performance optimization
- Documentation

Result: Production-ready feature

Why separate: Different quality bars; faster initial validation.

Research → Specification → Implementation

Pattern for complex features: Session 1: Research (1-2 hours)

"Research [domain/technology].

Deliverables:
- Current state-of-the-art
- Best practices
- Recommended approach with evidence
- Potential pitfalls

Save research to markdown file."

Session 2: Specification (2-4 hours)

Upload research from Session 1

"Create technical specification:
- Architecture
- Data models
- API contracts
- Security considerations
- Performance targets
- Testing strategy

Comprehensive but concise."

Session 3+: Implementation (4-16 hours)

Upload specification from Session 2

"Implement the specification.

Success criteria:
- All spec requirements met
- Comprehensive tests
- Performance targets achieved
- Documentation complete

Validate systematically at each phase."

Advantage: Clear milestones, better quality, easier to validate.

Continuous Validation Workflow

Integrate validation throughout:

Every 5-10 turns of implementation:
1. "Pause implementation"
2. "Run all tests so far"
3. "Show test output"
4. "Verify code coverage for new code"
5. If issues: Fix before continuing
6. If good: "Continue implementation"

Prevents late-stage surprises
Catches regressions early
Maintains quality throughout

Quality Gates

Gate 1: Compilation/Syntax

Must pass:

Code compiles without errors
No syntax errors
Import/dependency resolution works
Type checking passes (if applicable)

Zero tolerance: Compilation errors must be fixed immediately.

Gate 2: Unit Tests

Must pass:

All unit tests pass
No skipped tests (unless explicitly marked)
Coverage meets minimum threshold
No flaky tests

Failure response: Fix implementation or fix tests, but all must pass.

Gate 3: Integration Tests

Must pass:

Component interactions work correctly
External service integrations function
End-to-end flows complete successfully
Error scenarios handled

Gate 4: Performance

Must meet:

Latency targets
Throughput requirements
Resource usage within bounds
No performance regressions from baseline

Evidence required: Benchmark results comparing to targets and baselines.

Gate 5: Security

Must verify:

Input validation present
SQL injection prevented
XSS prevented (web apps)
Authentication/authorization correct
Secrets not in code
Dependencies without known vulnerabilities

Tools: Security scanners, static analysis, manual review.

Gate 6: Documentation

Must include:

Updated README
API documentation
Code comments where needed
Architecture diagrams (if structure changed)
Deployment notes

Quality check: Someone unfamiliar could understand and use the code.

Production Readiness Criteria

Definition of Done

Feature is done when:

All quality gates passed
Stakeholder acceptance criteria met
Documentation complete
Deployable to production
Rollback plan documented
Monitoring configured

Not done if:

Tests failing
Performance below targets
Security concerns unresolved
Documentation missing or inaccurate
Dependencies unlocked or vulnerable

Release Checklist

Pre-release:
☑ All tests passing (100%)
☑ Code review complete
☑ Performance validated
☑ Security scan clear
☑ Staging deployment successful
☑ Smoke tests passed
☑ Monitoring configured
☑ Runbook updated
☑ Rollback tested

Post-release:
☑ Deployment verified
☑ Smoke tests in production
☑ Monitoring alerts configured
☑ Error rates normal
☑ Performance metrics nominal
☑ Document release notes

Continuous Improvement

Learning from Sessions

Post-session retrospective:

After major sessions:
1. What went well?
2. What could improve?
3. What did we learn?
4. How could we work better together?

Document insights:
- Better requirement patterns
- Effective communication styles
- Successful validation approaches
- Pitfalls to avoid

Building Session Templates

Create reusable patterns: API Implementation Template:

Research framework and best practices
Design API contract (OpenAPI spec)
Implement endpoints with validation
Comprehensive test coverage (>90%)
Performance benchmarking
Documentation with examples
Security review
Create PR

Performance Optimization Template:

Establish baseline metrics
Profile current implementation
Identify bottlenecks (top 3)
Estimate optimization impact
Implement optimizations incrementally
Benchmark after each change
Validate no regressions
Document optimizations

Custom templates for your domain:

Save proven workflows as custom instructions
Consistency across sessions
Faster setup and execution
Higher quality through standardization

Advanced Quality Patterns

Mutation Testing

Beyond standard coverage:

"Run mutation testing on critical authentication code.

Process:
1. Use mutation testing framework
2. Inject faults into code
3. Verify tests catch mutations
4. Improve tests for uncaught mutations
5. Achieve high mutation score

Goal: Prove tests actually validate logic, not just achieve coverage"

Property-Based Testing

For algorithms and data structures:

"Implement property-based tests for sorting algorithm.

Properties to test:
- Output length equals input length
- Output elements are subset of input
- Output is sorted (∀i: output[i] ≤ output[i+1])
- Idempotent (sorting twice gives same result)

Use hypothesis library. Run 1000 random test cases."

Chaos Engineering

For distributed systems:

"Implement chaos testing:
- Random service failures
- Network latency injection
- Resource exhaustion
- Cascading failures

Verify system:
- Degrades gracefully
- Recovers automatically
- Logs errors appropriately
- Maintains data consistency"

Measuring Success

Session-Level Metrics

Track for each session:

Time to completion
Quality of output (test coverage, performance)
Iterations needed
Issue detection rate
User satisfaction with outcome

Improving metrics:

Better requirements → fewer iterations
Proactive validation → earlier issue detection
Clear communication → faster completion

Project-Level Metrics

Track across projects:

Features delivered per month
Time savings vs traditional development
Bug rate in production
Performance vs requirements
Code quality metrics

Continuous improvement:

Identify patterns in successful sessions
Learn from problematic sessions
Refine communication and requirements
Build better templates and workflows

Next Steps

Apply these best practices:

Billing Guide: Understanding costs and optimization
Models: How Maestro’s AI models work

With best practices mastered, you’re ready for expert-level Maestro usage.

​Code Quality Standards

​Pre-PR Quality Checklist

​Always Required (WIP or Production)

​For Production-Ready PRs (Additionally)

​Prohibited in ALL PRs

​Testing Standards

​Code Review Standards

​Validation Methodology

​Empirical Validation Required

​Running Full Test Suites

​Benchmark-Driven Development

​Session Management Best Practices

​Capacity Management

​File Management

​Memory Management

​Collaboration Patterns

​Code Review Workflow

​Team Workflows

​Security Best Practices

​Input Validation

​Dependency Management

​Secret Management

​Performance Best Practices

​Optimization Workflow

​Database Performance

​Caching Strategies

​Documentation Best Practices

​Code Documentation

​Architectural Documentation

​Reliability and Robustness

​Error Handling

​Graceful Degradation

​Monitoring and Observability

​Production Deployment

​Pre-Deployment Checklist

​Deployment Validation

​Anti-Patterns to Avoid

​Development Anti-Patterns

​Session Management Anti-Patterns

​Communication Anti-Patterns

​Expert Workflows

​Rapid Prototyping to Production

​Research → Specification → Implementation

​Continuous Validation Workflow

​Quality Gates

​Gate 1: Compilation/Syntax

​Gate 2: Unit Tests

​Gate 3: Integration Tests

​Gate 4: Performance

​Gate 5: Security

​Gate 6: Documentation

​Production Readiness Criteria

​Definition of Done

​Release Checklist

​Continuous Improvement

​Learning from Sessions

​Building Session Templates

​Advanced Quality Patterns

​Mutation Testing

​Property-Based Testing

​Chaos Engineering

​Measuring Success

​Session-Level Metrics

​Project-Level Metrics

​Next Steps

Code Quality Standards

Pre-PR Quality Checklist

Always Required (WIP or Production)

For Production-Ready PRs (Additionally)

Prohibited in ALL PRs

Testing Standards

Code Review Standards

Validation Methodology

Empirical Validation Required

Running Full Test Suites

Benchmark-Driven Development

Session Management Best Practices

Capacity Management

File Management

Memory Management

Collaboration Patterns

Code Review Workflow

Team Workflows

Security Best Practices

Input Validation

Dependency Management

Secret Management

Performance Best Practices

Optimization Workflow

Database Performance

Caching Strategies

Documentation Best Practices

Code Documentation

Architectural Documentation

Reliability and Robustness

Error Handling

Graceful Degradation

Monitoring and Observability

Production Deployment

Pre-Deployment Checklist

Deployment Validation

Anti-Patterns to Avoid

Development Anti-Patterns

Session Management Anti-Patterns

Communication Anti-Patterns

Expert Workflows

Rapid Prototyping to Production

Research → Specification → Implementation

Continuous Validation Workflow

Quality Gates

Gate 1: Compilation/Syntax

Gate 2: Unit Tests

Gate 3: Integration Tests

Gate 4: Performance

Gate 5: Security

Gate 6: Documentation

Production Readiness Criteria

Definition of Done

Release Checklist

Continuous Improvement

Learning from Sessions

Building Session Templates

Advanced Quality Patterns

Mutation Testing

Property-Based Testing

Chaos Engineering

Measuring Success

Session-Level Metrics

Project-Level Metrics

Next Steps