Delayed file processing for models using async api in EU region

Incident Report for Nanonets

Postmortem

Incident Summary
On 27th Jan 17:00 UTC, we experienced elevated load on one of our core LLM processing engines due to a sudden increase in traffic volume. This primarily affected file processing workflows in the EU region.

Impact

  • Synchronous APIs may have experienced intermittent failures or increased error rates.
  • Asynchronous file processing was degraded, with increased processing times.
  • Files submitted during this period were not lost and were eventually processed once the backlog was cleared, though with higher-than-usual latency.

Root Cause
A surge in traffic led to sustained high load on a core LLM engine, resulting in slower processing and queue buildup for async workflows in the EU region.

Resolution
Our engineering team actively monitored the situation and added additional processing capacity to handle the increased load and clear the backlog. Due to the size of the queue, backlog clearance took several hours, during which async response times were elevated. Once capacity was scaled and the backlog was processed, services returned to normal operation. We sincerely apologize for the inconvenience caused by this incident. We are working on adding more capacity on-demand so that this issue wont repeat again. Thank you for your understanding.

Posted Jan 30, 2026 - 06:00 UTC

Resolved

This incident has been resolved.
Posted Jan 27, 2026 - 22:12 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jan 27, 2026 - 21:32 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Jan 27, 2026 - 18:27 UTC

Investigating

We are currently investigating this issue.
Posted Jan 27, 2026 - 17:24 UTC
This incident affected: API.