Escalation procedure | Incidents


Initial set-up

At the start of the project, the PM should create an Incident Log file for the project using the designated template. The file should be stored in the project’s space in Confluence.

A critical incident is defined as one of the following situations:

  • The website is down or unavailable.

  • Key functionality is not working or working incorrectly (e.g., checkout process, login, user data access).

  • Any bug or system behavior that blocks a large number of users or severely affects business operations.

  • Security breaches or data leaks.

  • Any situation explicitly marked by the client as urgent or business-critical.

Incident-handling algorithm

  1. When a critical situation arises, the PM must tag the responsible developer, CTO (Oleh) and Account Manager (Alex / Sophia) in the internal Slack channel of the project.

  2. The PM has to assign the task to the responsible developer and request an estimate for resolution time.

  3. If no progress is made within 30 min – 1 hour, the PM must organize a call with the developer, CTO and Account Manager to collaboratively determine a solution.

If a client request regarding a critical incident arrives outside working hours or during weekends, the PM must contact Oleh, Alex, or Sophia via Telegram to ensure immediate attention.

While waiting for the issue to be resolved – the PM should update the client and leads every 30 minutes on the task’s progress and status.

Post-resolution steps

Once, the issue is resolved, the PM must:

  1. Update project’s Incident Log file with the details of the incident.

  2. Update the Projects Incident Log file, where the team keeps track of every incident.

  3. Notify the PMO to add the incident to the agenda of the PM call for the current week.

  4. Prepare a brief summary with the incident’s description, key takeaways and recommendation to prevent and/or handle similar incidents in the future.

The summary should cover:

  • What happened;

  • How the issue was identified;

  • Root cause of the issue;

  • Steps taken to resolve the problem (communication & tech-wise);

  • Key takeaways and recommendations for future use.

Comments

Leave a Reply