Image pull intermittently fails when creating pods on the MKS cluster layout provided by Morpheus

Hello,

Users may encounter an issue where the image pull intermittently fails when creating pods on the MKS cluster layout provided by Morpheus. This issue occurs when clusters are deployed frequently within a short period. This occurs when creating a pod. It appears that the image is being pulled from Docker Hub, but the Hub limits access to anonymous users to a certain number of times.

When creating many clusters, it is possible that this rate limit will be hit and some Pods will fail to start. However, an ImagePullBackoff status can also be seen, which means K8S will continue to try to pull the image and start the pod at least every 5 minutes. The Docker rate limit is reset after 6 hours so all pods should eventually start.

We are aware of this constraint hence we added a feature in 6.3.4-2 for users to register their own docker hub credentials such that Morpheus and K8S can use these accounts to pull images.

In my testing, though I could replicate the failing pods. The MKS cluster was up, with the master and workers all installed and reporting as healthy from a VM perspective. All workers joined the master node to form a cluster. I could see the same ImagePullBackoff errors on quite a few of the installed pods. But after waiting for the rate limit threshold to expire, all pods eventually started and were running as expected, thanks to how K8S manages the pods. :slight_smile:

Regarding the Docker Hub credentials, in 6.3.4-2, it now allows the use of a registered account as seen in the above screenshot. Users with paid accounts will likely be able to bypass rate limits entirely. So, if anyone else hits the same issue, remember that this problem is related to Docker Hub rather than a problem with Morpheus itself.

One thing I’d say is that one may still see issues as even free accounts are throttled, just not as low as anonymous use. To be unthrottled, have to be paying docker AFAIK.

In my testing, and it’s a while back, I saw that my cluster sorted itself out as throttling stopped, but I think it depends on where throttling starts causing issues. However, rapid recycling of clusters may cause issues. I think upgrading to 6.3.4-2 or higher and paying for a docker account would remove throttling and potentially resolve this issue.

Hope this is helpful.

Thanks
Deepti

1 Like

Another think to keep in mind is that in the later releases, we have changed the references from docker hub to quay.io where possible, so these rate limits should not accumulate as quickly as they have in the past

1 Like