Triage & Escalation
Playbook
A structured protocol for triaging, escalating, and resolving critical platform issues with ownership, clarity, and speed.
Situation Summary & Priorities
Scenario
Premium client facing a critical deadline - for example, a court filing or major deployment. The platform is their primary tool and any disruption poses a direct risk to their operation.
Core Priorities
- Stabilize trust - the client must feel supported and in control
- Restore continuity - remove or work around the blocker
- Triage and escalate - identify root cause and engage the right team
- Document and feed back - capture learnings for product improvement
Primary Objective
Resolve immediate blockers while maintaining long-term trust through transparent communication and swift action.
The Approach
Ownership Mindset
Treat blockers as mission-critical risks, not just system errors. Own the outcome, not just the ticket.
Immediate Response Protocol
The first minutes define the tone of the engagement. Focus on personal ownership and clear, human communication.
Triage & Investigation Steps
Immediate Action
Gather technical specifications from the client: file size, browser version, operating system, and timing of the issue. Record everything before troubleshooting begins.
Investigate
Attempt internal reproduction of the issue. Check platform status pages, recent deployments, and known incident timelines for correlation.
Isolate
Identify whether the issue is isolated to one account, a region, a specific workflow, or is systemic. Look for patterns across other accounts or recent support tickets.
Escalate
Notify internal teams via Slack or Jira with a concise summary. Proactively communicate findings and next steps to stakeholders.
Escalation Strategy
Internal Team Notification
Alert the internal support and engineering Slack channel. Include client severity, environment details, and a brief description of the blocker. Tag the on-call engineer if applicable.
Engineering Engagement
Hand off reproduction steps, severity assessment, and any logs or screenshots to the engineering team. Clearly state the business impact and deadline context.
Product Feedback Loop
For workflow-related patterns, file a product feedback ticket. Tag the product manager for the affected area and include the client use case.
Incident Management
For systemic issues - outages, region-wide degradation, or security-related concerns - activate the broader incident management process and assign a dedicated incident lead.
Continuity Planning
When a fix is not immediate, keep the client moving with time-boxed workarounds and clear expectations.
Time-Boxed Workarounds
- Retry cycle: Attempt the operation every 10 minutes. Log each attempt and any changes in error behavior.
- Batching: Split large operations into smaller batches to isolate the failing element and maintain partial progress.
- Format changes: Export or convert files to an alternate format that bypasses the failing pipeline.
- Manual overrides: Where automation is blocked, offer a manual processing path to unblock the critical path.
Internal Documentation & Product Feedback
Incident Summaries
Standardized one-page summaries of each escalation: what happened, timeline, root cause, resolution, and follow-up items.
Audit Trails
Maintain a chronological log of troubleshooting steps taken, hypotheses tested, and their outcomes for future reference.
Key Learnings
Document systemic patterns and recurring themes that emerge across incidents to inform long-term platform stability work.
Product Feedback
Categorize feedback into three tracks: Stability (crashes, timeouts), Messaging (confusing UI or error text), and Monitoring (missing alerts or observability gaps).
Note: Every escalation is an opportunity to make the platform more resilient. Treat documentation as a product improvement tool, not an administrative chore.