2025-02-06 CRM Low Performance

Name(s): Yurich, Serhii, Roman, Anton L., Martin

Date: 2025-02-06

Last modified: 2025-02-07

Summary

CRM was reported as slow, we investigated and found out that Manager 1 was at 100% CPU and Manager 2 was at 0% CPU load. Further investigation showed that Manager 2 had a full disc due to logs. Removing them solved the issue.

Impact

CRM and Talent were slow, but reachable. No data loss.

Timeline

  • 13:35: Manager 2 full disc space, all services shut down. Manager 1 got all the load, which resulted in 100% CPU load

  • 13:58: Beni contacted Martin via Teams that perfomance was low

  • 14:03: Martin wrote in Development channel that there are issues with perfomance regarding CRM

  • 14:10: Yurich and Anton started investigation, but quick solution was not found.

  • 14:44: Martin started video call in production channel and created task force with Anton, Roman, Serhii and Yurich. Other developers joined as well.

  • 14:51: Communication in Rocken Chat that issue exists

  • 15:10: Serhii saw that Manager 2 disc was full and removed old logs. After that servers stabilized

  • 15:21: Communication in Rocken Chat that issue was resolved

  • 15:42: Roman noticed that the websocket container’s log file keeps growing.

  • 15:45: Roman started video call with Yurii and Serhii.

  • 15:55: Serhii found environment variable that is responsible for enabling debug mode for the websocket container

  • 16:05: Roman set the variable SOKETI_DEBUG: 0 for websocket container and restarted the stack in docker swarm to fix the issue immediately.

  • 16:06: Yurii set the same value of the variable in the GitLab CI variables in order to avoid this issue in the future.

Root Cause(s)

The /var folder of Manager 2 was full with websocket logs. No more memory resulted in services crashing. The load of the services couldn’t be distributed and was directed to mostly Manager 1.

Action Items

  • Log rotation for all production services on Manager 1, 2 and 3


    RT-6607

Appendix

Screenshot 2025-02-06 163115.pngScreenshot 2025-02-06 163147.png

Comments

Leave a Reply