Hello!
I just deployed AWX 24.6.0 in kubernetes with a fresh unmanaged postgres DB with Operator version: 2.19.0.
The operator scales the web deployment and the task deployment, however I have this migration job that gets spawned and launches migration pod that keep on getting OOM killed.
So i tried to bump the operator’s version to 2.19.1 to get the template file with the resources specified, however it seems the image was built incorrectly or the wrong image was published to the registry in quay.io ?
First, the template file for migration jobs is only specifying resources for the init container, not the main container. So, you’re probably getting the correct image from quay.io, just not the resource settings you were expecting.
Secondly, I’ve never seen the migration job get OOM-killed. While this could be remedied with setting resource limits, I would suspect the underlying k8s platform has insufficient resources for the new AWX cluster. So, even if you manage to get the migration job to complete somehow without it getting OOM-killed, the rest of the AWX pods may get OOM-killed once they start up afterwards as well.
Got you, i was able to fix my problem by editing the container template migration file with resources limts/requests high enough so that they don’t get killed.
While this could be remedied with setting resource limits, I would suspect the underlying k8s platform has insufficient resources for the new AWX cluster
our k8s admins enforces a default limit when none are specified that was the reason why it was getting oom killed.