Intermittent file upload failures
Incident Report for Nanonets
Postmortem

Incident Summary:
On 07-OCT-2024 11:30PM IST, a new feature was deployed, which inadvertently caused a spike in our RDS usage. This led to some APIs timing out, causing 5xx errors, and resulting in random file upload failures for affected users.

Root Cause:
The root cause of the issue was a feature deployed by one of our developers that had an unintended effect on the database. The feature introduced inefficiencies in database queries, leading to a significant increase in RDS usage. The increased load caused API requests to exceed their time limits, resulting in timeouts and random file upload failures.

Resolution:
As soon as the issue was identified, we reverted the change and restored normal operations. The system returned to a stable state shortly after the rollback. We are now implementing additional safeguards and testing procedures to prevent similar incidents in the future, ensuring we do not experience such unanticipated impacts again.

Timeline:
OCT 7th 11:50PM IST - OCT 8th 12:30 AM IST

Posted Oct 08, 2024 - 07:12 UTC

Resolved
This incident has been resolved.
Posted Oct 07, 2024 - 19:24 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 07, 2024 - 19:24 UTC
Investigating
We are currently investigating this issue.
Posted Oct 07, 2024 - 18:55 UTC
This incident affected: API and Web App.