File processing issue with Instant Learning and Zero Training models
Incident Report for Nanonets
Postmortem

At around 10:46 UTC on 7th Nov, one of our queueing systems experienced heavy load which led to requests getting queued and frequently timing out for Instant Learning and Zero Training models. We got alerted to it and quickly scaled it up, and by 11:15 UTC, the backlog got cleared and the incident was resolved.

We are adding additional alerting to this queueing system to make sure that we can catch these type of issues well before the queue backs up.

Posted Nov 08, 2024 - 06:15 UTC

Resolved
This incident has been resolved.
Posted Nov 07, 2024 - 11:15 UTC
Update
We are continuing to monitor for any further issues.
Posted Nov 07, 2024 - 11:09 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 07, 2024 - 11:09 UTC
Investigating
We are currently investigating this issue.
Posted Nov 07, 2024 - 10:46 UTC
This incident affected: API and Web App.