Incident Summary
On 27th Jan 17:00 UTC, we experienced elevated load on one of our core LLM processing engines due to a sudden increase in traffic volume. This primarily affected file processing workflows in the EU region.
Impact
Root Cause
A surge in traffic led to sustained high load on a core LLM engine, resulting in slower processing and queue buildup for async workflows in the EU region.
Resolution
Our engineering team actively monitored the situation and added additional processing capacity to handle the increased load and clear the backlog. Due to the size of the queue, backlog clearance took several hours, during which async response times were elevated. Once capacity was scaled and the backlog was processed, services returned to normal operation. We sincerely apologize for the inconvenience caused by this incident. We are working on adding more capacity on-demand so that this issue wont repeat again. Thank you for your understanding.