Elevated response times for instant learning models
Incident Report for Nanonets
Postmortem

Incident Summary:
Users experienced elevated response times for instant learning models due to a disruption in our processing system.

Root Cause:
One of our GPU nodes was down, which significantly affected file processing times and led to slower response times for our users using instant learning models.

Resolution:
We promptly identified the machine and removed it from our pool, which restored normal processing times. As a long-term fix, we are implementing a robust mechanism to ensure that any node or machine going down will not impact file processing times. This will include automatic detection and removal of faulty nodes from our pool and redistribution of the workload to healthy nodes.

We sincerely apologize for the inconvenience this incident may have caused. We understand the importance of reliable and fast service, and we are taking the necessary steps to prevent such issues from occurring in the future. We appreciate your patience and understanding.

Posted Jul 11, 2024 - 07:44 UTC

Resolved
This incident has been resolved.
Posted Jul 09, 2024 - 15:35 UTC
Investigating
We are currently investigating this issue.
Posted Jul 09, 2024 - 15:05 UTC
This incident affected: API and Web App.