Proposal: E2E Testing Workflow Changes

Goal

Learn about e2e test failures before they hit develop.

Context

Current Challenges

Our current e2e testing workflow runs tests nightly against the develop branch. This means failures are discovered after code has already been merged, creating a reactive rather than proactive approach to quality assurance.

Why Change Now?

Our team structure and release cadence are evolving:

Multiple teams - More than one team is now contributing to the codebase
Expanded QA involvement - More than two QAs are interacting with the codebase
Strict release schedule - Moving to a fixed 2-week release cycle

The Problem Scenario

With nightly e2e tests running only against develop:

A developer merges code late in the day (or evening)
The nightly e2e run executes overnight
Tests fail due to the merged changes
The failure is discovered the next morning
If this happens close to release day, the team faces:
- Rushed debugging under pressure
- Potential release delays
- Difficult decisions about reverting vs. fixing forward
- Chaos and stress across teams

Worst case: Release day arrives, last night's e2e tests failed due to something merged the evening before, and the team scrambles to diagnose and fix issues while the release deadline looms.

Proposed Solutions

Option 1: Quarantine Branch

A staging branch sits between feature branches and develop. PRs merge into quarantine first, e2e tests run, then quarantine merges into develop.

Pros:

Develop stays protected from breaking changes
Clear separation between "pending validation" and "validated" code

Cons:

Moving target - quarantine constantly changes as PRs merge in
Overhead of managing merges from quarantine → develop
Merge conflicts multiply (feature → quarantine, then quarantine → develop)
Who owns the quarantine → develop merge? When does it happen?

Option 2: E2E Tests on PR (Label-Triggered)

Developers label their PR (e.g., e2e-required) to trigger e2e tests before merge is allowed.

Flow:

Developer raises PR
Developer adds label to trigger e2e suite
E2e tests run against the PR
PR can only merge once e2e passes

Pros:

Shift-left: failures caught before code reaches develop
Developer ownership - they see their own failures
No quarantine branch overhead

Cons:

Slow feedback loop (e2e suites take time)
Resource cost - running full e2e on every labelled PR
Tests run against PR branch, not the merged result (could still break on merge)

Variation - Quarantine overnight: After PR e2e passes, PR moves to a merge queue that batches overnight. If nightly run passes, batch merges to develop.

Option 3: Merge Queue with E2E Gate

Use GitHub's merge queue feature (or similar). PRs enter a queue, get rebased/merged together, e2e runs against the combined result, then merges to develop.

Flow:

PR approved and ready to merge
Developer adds to merge queue
Queue batches PRs together
E2e runs against the batch
If pass → all PRs in batch merge to develop
If fail → batch is broken down to isolate the culprit

Pros:

Tests the actual merged state, not just the PR in isolation
Batching reduces total e2e runs needed
Develop is always in a passing state
Built-in GitHub feature (if using merge queue)

Cons:

Slower path to develop (waiting for queue + e2e)
Batch failures affect multiple PRs - isolation debugging needed
Requires workflow/tooling changes

Option 4: Tiered Approach

Not all PRs need full e2e. Use a risk-based approach:

Change Type	Gate
Docs, config, tests only	No e2e required
Low-risk (styling, copy)	Smoke e2e only
Feature/bugfix touching core flows	Full e2e required
High-risk (auth, payments, job lifecycle)	Full e2e + manual QA sign-off

Pros:

Faster for low-risk changes
Focused e2e resources on what matters
QA can prioritise attention

Cons:

Requires accurate labelling/classification
Risk of miscategorisation
More complex workflow rules

Recommended Approach: Partial E2E on PR + Nightly Full Suite

Given our constraints:

Full e2e suite: ~2 hours (too slow for PR-level)
PR volume: ~10/day
Capability: Partial e2e suites available

Proposed Flow

┌─────────────────┐
│   PR Created    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Bot suggests   │  "Consider adding: e2e:auth, e2e:jobs"
│  e2e labels     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Code review    │
│  + Dev adds     │
│  e2e labels     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Partial e2e     │  Target: <30 mins
│ runs on PR      │  Must pass to merge
└────────┬────────┘
         │
         ▼ (pass)
┌─────────────────┐
│ Merge to        │
│ develop         │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Nightly full    │  Full 2hr suite
│ e2e suite       │  Safety net for edge cases
└─────────────────┘

What This Gives You

Most issues caught before merge - Partial e2e runs on every PR
Fast feedback - Devs know within 30 mins, not next morning
Full coverage nightly - Edge cases caught overnight
Simple flow - No quarantine branch, no manual gates

Layer Breakdown

Layer	When	What Runs	Duration	Purpose
PR	On label added	Partial suite (affected areas)	<30 mins	Catch issues before merge
Nightly	Overnight	Full suite	~2 hrs	Safety net for edge cases

E2E Suite Segmentation

We need to tag/organise e2e tests by area:

Label	Test Scope	Estimated Duration
`e2e:smoke`	Critical happy paths only	~15 mins
`e2e:auth`	Login, logout, token refresh, onboarding	~10 mins
`e2e:jobs`	Job lifecycle, state transitions	~20 mins
`e2e:installation`	Installation flows by product type	~25 mins
`e2e:repair`	Repair/callout/service flows	~20 mins
`e2e:payments`	Invoices, payments	~10 mins
`e2e:full`	Everything (rarely used on PR)	~2 hrs

Aviator Configuration

Aviator needs to be configured to:

Require partial e2e CI check to pass before queuing (when label present)

[TODO: Add specific Aviator config changes needed]

Label Suggestion Bot

A GitHub Action will comment on new PRs with suggested e2e labels based on changed files.

Example mapping:

File Path Pattern	Suggested Label
`src/services/api/auth*`	`e2e:auth`
`src/app/(app)/(tabs)/jobs/*`	`e2e:jobs`
`src/components/installation/*`	`e2e:installation`
`src/components/repair/*`	`e2e:repair`
`src/services/api/invoice*`	`e2e:payments`
`docs/*` only	No e2e required
Any other code change	`e2e:smoke`

Example bot comment:

E2E Label Suggestions
Based on the files changed in this PR, consider adding the following labels:
e2e:auth - changes detected in authentication code
e2e:jobs - changes detected in job-related components
Add labels manually via the sidebar, or ask a reviewer to add them.

Implementation Steps

Phase 1: Segment the E2E Suite

[ ] Audit existing e2e tests and categorise by area
[ ] Add tags/labels to test files or create separate test configs
[ ] Create CI workflows for each partial suite
[ ] Validate partial suites run correctly in isolation

Phase 2: Label Suggestion Bot

[ ] Create GitHub Action that runs on PR open/update
[ ] Define file path → label mapping
[ ] Bot posts comment with suggested labels
[ ] Test on a few PRs to validate suggestions

Phase 3: PR-Level E2E

[ ] Create GitHub labels (e2e:auth, e2e:jobs, etc.)
[ ] Update CI to trigger partial e2e based on label
[ ] Configure Aviator to require e2e pass before merge
[ ] Document labelling guidelines for developers

Phase 4: Rollout

[ ] Communicate changes to all teams
[ ] Run in advisory mode first (warn but don't block)
[ ] Switch to enforced mode after 1-2 sprints

Expected Benefits

Faster feedback - Devs know within 30 mins if their PR breaks e2e
Most issues caught before merge - Not discovered the next morning
Reduced release-day risk - Fewer surprises close to release
Clear ownership - Bot suggests labels, dev adds them, dev owns the outcome
Scalable - Works with multiple teams and higher PR volume

Risks and Mitigations

Risk	Mitigation
Devs forget to add labels	Bot comments with suggestions on every PR
Partial suites miss issues	Nightly full suite catches them; fix next morning
Test flakiness blocks PRs	Quarantine flaky tests; track flake rate
Nightly fails on develop	Prioritise fix in the morning; doesn't block PRs

Open Questions

[ ] Who owns the e2e suite segmentation work?
[ ] What's the current test flake rate? Do we need to address flakiness first?
[ ] How do we handle PRs that touch multiple areas? (run multiple partial suites?)
[ ] Who gets notified if nightly fails?

Proposal: E2E Testing Workflow Changes ​

Goal ​

Context ​

Current Challenges ​

Why Change Now? ​

The Problem Scenario ​

Proposed Solutions ​

Option 1: Quarantine Branch ​

Option 2: E2E Tests on PR (Label-Triggered) ​

Option 3: Merge Queue with E2E Gate ​

Option 4: Tiered Approach ​

Recommended Approach: Partial E2E on PR + Nightly Full Suite ​

Proposed Flow ​

What This Gives You ​

Layer Breakdown ​

E2E Suite Segmentation ​

Aviator Configuration ​

Label Suggestion Bot ​

Implementation Steps ​

Phase 1: Segment the E2E Suite ​

Phase 2: Label Suggestion Bot ​

Phase 3: PR-Level E2E ​

Phase 4: Rollout ​

Expected Benefits ​

Risks and Mitigations ​

Open Questions ​

Proposal: E2E Testing Workflow Changes

Goal

Context

Current Challenges

Why Change Now?

The Problem Scenario

Proposed Solutions

Option 1: Quarantine Branch

Option 2: E2E Tests on PR (Label-Triggered)

Option 3: Merge Queue with E2E Gate

Option 4: Tiered Approach

Recommended Approach: Partial E2E on PR + Nightly Full Suite

Proposed Flow

What This Gives You

Layer Breakdown

E2E Suite Segmentation

Aviator Configuration

Label Suggestion Bot

Implementation Steps

Phase 1: Segment the E2E Suite

Phase 2: Label Suggestion Bot

Phase 3: PR-Level E2E

Phase 4: Rollout

Expected Benefits

Risks and Mitigations

Open Questions