The pool build-supervisors is now healthy and stable, the queue is running smoothly, and we don't expect a regression.
The root-cause is that we hit a bug in docker, which caused an inconsistency in the container state. The resulting zombie containers caused a reduction in our capacity which could not be auto-remediated, since our container orchestration depends on Docker operating properly.
Posted 11 months ago. Sep 25, 2017 - 18:17 UTC
The backlog has been clear for a while, and we'll continue to monitor it.
The root cause of this issue appears to involve a bug in interaction between Kubernetes and Docker, which affects our build-supervisors (not the build VMs themselves).
Posted 11 months ago. Sep 25, 2017 - 17:07 UTC
We've added more worker-capacity to cope with the issue, and now the queue is processing more quickly. We're still working to identify and fix the root cause.
Posted 11 months ago. Sep 25, 2017 - 16:31 UTC
Many iOS builds are currently being queued. We're investigating the issue and will fix ASAP