2025-02-06 CRM Low Performance

Name(s): Yurich, Serhii, Roman, Anton L., Martin

Date: 2025-02-06

Last modified: 2025-02-07

Summary

CRM was reported as slow, we investigated and found out that Manager 1 was at 100% CPU and Manager 2 was at 0% CPU load. Further investigation showed that Manager 2 had a full disc due to logs. Removing them solved the issue.

Impact

CRM and Talent were slow, but reachable. No data loss.

Timeline

13:35: Manager 2 full disc space, all services shut down. Manager 1 got all the load, which resulted in 100% CPU load
13:58: Beni contacted Martin via Teams that perfomance was low
14:03: Martin wrote in Development channel that there are issues with perfomance regarding CRM
14:10: Yurich and Anton started investigation, but quick solution was not found.
14:44: Martin started video call in production channel and created task force with Anton, Roman, Serhii and Yurich. Other developers joined as well.
14:51: Communication in Rocken Chat that issue exists
15:10: Serhii saw that Manager 2 disc was full and removed old logs. After that servers stabilized
15:21: Communication in Rocken Chat that issue was resolved
15:42: Roman noticed that the websocket container’s log file keeps growing.
15:45: Roman started video call with Yurii and Serhii.
15:55: Serhii found environment variable that is responsible for enabling debug mode for the websocket container
16:05: Roman set the variable SOKETI_DEBUG: 0 for websocket container and restarted the stack in docker swarm to fix the issue immediately.
16:06: Yurii set the same value of the variable in the GitLab CI variables in order to avoid this issue in the future.

Root Cause(s)

The /var folder of Manager 2 was full with websocket logs. No more memory resulted in services crashing. The load of the services couldn’t be distributed and was directed to mostly Manager 1.

Action Items

Log rotation for all production services on Manager 1, 2 and 3

RT-6607

Appendix

Screenshot 2025-02-06 163115.png Screenshot 2025-02-06 163147.png

2025-02-06 CRM Low Performance

Summary

Impact

Timeline

Root Cause(s)

Action Items

Appendix

Comments

Leave a Reply Cancel reply

More posts

DNS changing procedure

2026 Recommendations & Plans

Story 29.3. Featured Sliders (Homepage & PDP)

Story 29.2. Frontend Shop Views & Model Flip Logic