Proposal: E2E Testing Workflow Changes
Goal
Learn about e2e test failures before they hit develop.
Context
Current Challenges
Our current e2e testing workflow runs tests nightly against the develop branch. This means failures are discovered after code has already been merged, creating a reactive rather than proactive approach to quality assurance.
Why Change Now?
Our team structure and release cadence are evolving:
- Multiple teams - More than one team is now contributing to the codebase
- Expanded QA involvement - More than two QAs are interacting with the codebase
- Strict release schedule - Moving to a fixed 2-week release cycle
The Problem Scenario
With nightly e2e tests running only against develop:
- A developer merges code late in the day (or evening)
- The nightly e2e run executes overnight
- Tests fail due to the merged changes
- The failure is discovered the next morning
- If this happens close to release day, the team faces:
- Rushed debugging under pressure
- Potential release delays
- Difficult decisions about reverting vs. fixing forward
- Chaos and stress across teams
Worst case: Release day arrives, last night's e2e tests failed due to something merged the evening before, and the team scrambles to diagnose and fix issues while the release deadline looms.
Proposed Solutions
Option 1: Quarantine Branch
A staging branch sits between feature branches and develop. PRs merge into quarantine first, e2e tests run, then quarantine merges into develop.
Pros:
- Develop stays protected from breaking changes
- Clear separation between "pending validation" and "validated" code
Cons:
- Moving target - quarantine constantly changes as PRs merge in
- Overhead of managing merges from quarantine → develop
- Merge conflicts multiply (feature → quarantine, then quarantine → develop)
- Who owns the quarantine → develop merge? When does it happen?
Option 2: E2E Tests on PR (Label-Triggered)
Developers label their PR (e.g., e2e-required) to trigger e2e tests before merge is allowed.
Flow:
- Developer raises PR
- Developer adds label to trigger e2e suite
- E2e tests run against the PR
- PR can only merge once e2e passes
Pros:
- Shift-left: failures caught before code reaches develop
- Developer ownership - they see their own failures
- No quarantine branch overhead
Cons:
- Slow feedback loop (e2e suites take time)
- Resource cost - running full e2e on every labelled PR
- Tests run against PR branch, not the merged result (could still break on merge)
Variation - Quarantine overnight: After PR e2e passes, PR moves to a merge queue that batches overnight. If nightly run passes, batch merges to develop.
Option 3: Merge Queue with E2E Gate
Use GitHub's merge queue feature (or similar). PRs enter a queue, get rebased/merged together, e2e runs against the combined result, then merges to develop.
Flow:
- PR approved and ready to merge
- Developer adds to merge queue
- Queue batches PRs together
- E2e runs against the batch
- If pass → all PRs in batch merge to develop
- If fail → batch is broken down to isolate the culprit
Pros:
- Tests the actual merged state, not just the PR in isolation
- Batching reduces total e2e runs needed
- Develop is always in a passing state
- Built-in GitHub feature (if using merge queue)
Cons:
- Slower path to develop (waiting for queue + e2e)
- Batch failures affect multiple PRs - isolation debugging needed
- Requires workflow/tooling changes
Option 4: Tiered Approach
Not all PRs need full e2e. Use a risk-based approach:
| Change Type | Gate |
|---|---|
| Docs, config, tests only | No e2e required |
| Low-risk (styling, copy) | Smoke e2e only |
| Feature/bugfix touching core flows | Full e2e required |
| High-risk (auth, payments, job lifecycle) | Full e2e + manual QA sign-off |
Pros:
- Faster for low-risk changes
- Focused e2e resources on what matters
- QA can prioritise attention
Cons:
- Requires accurate labelling/classification
- Risk of miscategorisation
- More complex workflow rules
Recommended Approach: Partial E2E on PR + Nightly Full Suite
Given our constraints:
- Full e2e suite: ~2 hours (too slow for PR-level)
- PR volume: ~10/day
- Capability: Partial e2e suites available
Proposed Flow
┌─────────────────┐
│ PR Created │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Bot suggests │ "Consider adding: e2e:auth, e2e:jobs"
│ e2e labels │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Code review │
│ + Dev adds │
│ e2e labels │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Partial e2e │ Target: <30 mins
│ runs on PR │ Must pass to merge
└────────┬────────┘
│
▼ (pass)
┌─────────────────┐
│ Merge to │
│ develop │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Nightly full │ Full 2hr suite
│ e2e suite │ Safety net for edge cases
└─────────────────┘What This Gives You
- Most issues caught before merge - Partial e2e runs on every PR
- Fast feedback - Devs know within 30 mins, not next morning
- Full coverage nightly - Edge cases caught overnight
- Simple flow - No quarantine branch, no manual gates
Layer Breakdown
| Layer | When | What Runs | Duration | Purpose |
|---|---|---|---|---|
| PR | On label added | Partial suite (affected areas) | <30 mins | Catch issues before merge |
| Nightly | Overnight | Full suite | ~2 hrs | Safety net for edge cases |
E2E Suite Segmentation
We need to tag/organise e2e tests by area:
| Label | Test Scope | Estimated Duration |
|---|---|---|
e2e:smoke | Critical happy paths only | ~15 mins |
e2e:auth | Login, logout, token refresh, onboarding | ~10 mins |
e2e:jobs | Job lifecycle, state transitions | ~20 mins |
e2e:installation | Installation flows by product type | ~25 mins |
e2e:repair | Repair/callout/service flows | ~20 mins |
e2e:payments | Invoices, payments | ~10 mins |
e2e:full | Everything (rarely used on PR) | ~2 hrs |
Aviator Configuration
Aviator needs to be configured to:
- Require partial e2e CI check to pass before queuing (when label present)
[TODO: Add specific Aviator config changes needed]
Label Suggestion Bot
A GitHub Action will comment on new PRs with suggested e2e labels based on changed files.
Example mapping:
| File Path Pattern | Suggested Label |
|---|---|
src/services/api/auth* | e2e:auth |
src/app/(app)/(tabs)/jobs/* | e2e:jobs |
src/components/installation/* | e2e:installation |
src/components/repair/* | e2e:repair |
src/services/api/invoice* | e2e:payments |
docs/* only | No e2e required |
| Any other code change | e2e:smoke |
Example bot comment:
E2E Label Suggestions
Based on the files changed in this PR, consider adding the following labels:
e2e:auth- changes detected in authentication codee2e:jobs- changes detected in job-related componentsAdd labels manually via the sidebar, or ask a reviewer to add them.
Implementation Steps
Phase 1: Segment the E2E Suite
- [ ] Audit existing e2e tests and categorise by area
- [ ] Add tags/labels to test files or create separate test configs
- [ ] Create CI workflows for each partial suite
- [ ] Validate partial suites run correctly in isolation
Phase 2: Label Suggestion Bot
- [ ] Create GitHub Action that runs on PR open/update
- [ ] Define file path → label mapping
- [ ] Bot posts comment with suggested labels
- [ ] Test on a few PRs to validate suggestions
Phase 3: PR-Level E2E
- [ ] Create GitHub labels (
e2e:auth,e2e:jobs, etc.) - [ ] Update CI to trigger partial e2e based on label
- [ ] Configure Aviator to require e2e pass before merge
- [ ] Document labelling guidelines for developers
Phase 4: Rollout
- [ ] Communicate changes to all teams
- [ ] Run in advisory mode first (warn but don't block)
- [ ] Switch to enforced mode after 1-2 sprints
Expected Benefits
- Faster feedback - Devs know within 30 mins if their PR breaks e2e
- Most issues caught before merge - Not discovered the next morning
- Reduced release-day risk - Fewer surprises close to release
- Clear ownership - Bot suggests labels, dev adds them, dev owns the outcome
- Scalable - Works with multiple teams and higher PR volume
Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Devs forget to add labels | Bot comments with suggestions on every PR |
| Partial suites miss issues | Nightly full suite catches them; fix next morning |
| Test flakiness blocks PRs | Quarantine flaky tests; track flake rate |
| Nightly fails on develop | Prioritise fix in the morning; doesn't block PRs |
Open Questions
- [ ] Who owns the e2e suite segmentation work?
- [ ] What's the current test flake rate? Do we need to address flakiness first?
- [ ] How do we handle PRs that touch multiple areas? (run multiple partial suites?)
- [ ] Who gets notified if nightly fails?