At around 4:30 AM, we noticed that encryption / decryption of credentials was failing. This led to a failure of many Chain runs and in some cases, you were also unable to create and save new commands within the Integration Studio Chain Builder.
By 8:45AM, our engineers on-call were able to identify the problem. One of our servers was not responding to requests due to a full hard disk. Instead of failing and falling back to our backup server, the process was stuck and responded to requests with errors. While we should have been able to detect this failure sooner, the issue was rectified shortly after we identified the problem.
To mitigate future outages of this nature, we plan on adding health check monitoring for this service that includes identifying this particular failure state and automatically recovering.