The pool build-supervisors is now healthy and stable, the queue is running smoothly, and we don't expect a regression.
The root-cause is that we hit a bug in docker, which caused an inconsistency in the container state. The resulting zombie containers caused a reduction in our capacity which could not be auto-remediated, since our container orchestration depends on Docker operating properly.
Posted 5 months ago. Sep 25, 2017 - 18:17 UTC
The backlog has been clear for a while, and we'll continue to monitor it.
The root cause of this issue appears to involve a bug in interaction between Kubernetes and Docker, which affects our build-supervisors (not the build VMs themselves).
Posted 5 months ago. Sep 25, 2017 - 17:07 UTC
We've added more worker-capacity to cope with the issue, and now the queue is processing more quickly. We're still working to identify and fix the root cause.
Posted 5 months ago. Sep 25, 2017 - 16:31 UTC
Many iOS builds are currently being queued. We're investigating the issue and will fix ASAP