How to increase resource requirements of the migration job for an AWX deployed via K8s

Hello!
I just deployed AWX 24.6.0 in kubernetes with a fresh unmanaged postgres DB with Operator version: 2.19.0.

The operator scales the web deployment and the task deployment, however I have this migration job that gets spawned and launches migration pod that keep on getting OOM killed.

Is there a way to increase the memory limit for the container in the spawned pod by the job? I haven’t found a field for this from the following list documented: Containers Resource Requirements - Ansible AWX Operator Documentation

spec:
  containers:
    - command:
        - awx-manage
        - migrate
        - '--noinput'
      image: '<path/to/registry>/ansible/awx:24.6.0'
      imagePullPolicy: IfNotPresent
      name: migration-job
      resources:
        limits:
          cpu: '1'
          memory: 100Mi
        requests:
          cpu: 100m
          memory: 50Mi

I saw in the project repo that it is set as a variable: awx-operator/roles/installer/templates/jobs/migration.yaml.j2 at 9718424483347c0ddd94b5eb6eff88d54226028a · ansible/awx-operator · GitHub

however, my task_resource_requirements is defined as

      resources:
        limits:
          cpu: '4'
          memory: 4Gi
        requests:
          cpu: '1'
          memory: 1Gi

hmm doesn’t seem possible ? awx-operator/roles/installer/tasks/migrate_schema.yml at 9718424483347c0ddd94b5eb6eff88d54226028a · ansible/awx-operator · GitHub

@Denney-tech think you could point me in the right direction please?

So i tried to bump the operator’s version to 2.19.1 to get the template file with the resources specified, however it seems the image was built incorrectly or the wrong image was published to the registry in quay.io ?

$ podman run -it --entrypoint "" <company-registry>.com/prod-docker/ansible/awx-operator:2.19.1 grep -A 20 'migration-job' 'roles/installer/templates/jobs/migration.yaml.j2'
        - name: "migration-job"
          image: '{{ _image }}'
          command:
            - awx-manage
            - migrate
            - --noinput
          volumeMounts:
            - name: {{ ansible_operator_meta.name }}-application-credentials
              mountPath: "/etc/tower/conf.d/credentials.py"
              subPath: credentials.py
              readOnly: true
            - name: "{{ secret_key_secret_name }}"
              mountPath: /etc/tower/SECRET_KEY
              subPath: SECRET_KEY
              readOnly: true
            - name: {{ ansible_operator_meta.name }}-settings
              mountPath: "/etc/tower/settings.py"
              subPath: settings.py
              readOnly: true
            {{ lookup("template", "common/volume_mounts/extra_settings_files.yaml.j2")  | indent(width=12) | trim }}
{% if bundle_ca_crt %}

First, the template file for migration jobs is only specifying resources for the init container, not the main container. So, you’re probably getting the correct image from quay.io, just not the resource settings you were expecting.

Secondly, I’ve never seen the migration job get OOM-killed. While this could be remedied with setting resource limits, I would suspect the underlying k8s platform has insufficient resources for the new AWX cluster. So, even if you manage to get the migration job to complete somehow without it getting OOM-killed, the rest of the AWX pods may get OOM-killed once they start up afterwards as well.

Got you, i was able to fix my problem by editing the container template migration file with resources limts/requests high enough so that they don’t get killed.

While this could be remedied with setting resource limits, I would suspect the underlying k8s platform has insufficient resources for the new AWX cluster

our k8s admins enforces a default limit when none are specified that was the reason why it was getting oom killed.

1 Like

thanks for taking a peek!