On Apr. 12 at ~9am PST, we received an internal alert indicating that the web conference recording subsystem may be malfunctioning.
We began an investigation, and the issue was traced to a significant load on our primary production database. It appeared like the high load was due to an extremely large number of processing machines pulling tasks from the queue to retroactively apply new Gong functionality to historical calls.
Once the issue has been identified, the task queue was emptied and the database server was restarted. Subsequently, recording proceeded normally.
On Apr. 15, we've conducted a postmortem analysis, yielding several action items to reduce likelihood of similar issues in the future:
The task queuing mechanism is being re-engineered to avoid being affected by a large number of processing machines.
The recording subsystem is being migrated to a separate database, to avoid potential adverse impact by extraneous load in other subsystems.
A subset of the data stored in our primary database, identified as a potential mid-term bottleneck, will also be migrated shortly into a separate data store.
Again, we apologize for any inconvenience this incident may have caused.