How to increase resource requirements of the migration job for an AWX deployed via K8s

cnfrancis · November 22, 2024, 1:28am

Hello!
I just deployed AWX 24.6.0 in kubernetes with a fresh unmanaged postgres DB with Operator version: 2.19.0.

The operator scales the web deployment and the task deployment, however I have this migration job that gets spawned and launches migration pod that keep on getting OOM killed.

Is there a way to increase the memory limit for the container in the spawned pod by the job? I haven’t found a field for this from the following list documented: Containers Resource Requirements - Ansible AWX Operator Documentation

spec:
  containers:
    - command:
        - awx-manage
        - migrate
        - '--noinput'
      image: '<path/to/registry>/ansible/awx:24.6.0'
      imagePullPolicy: IfNotPresent
      name: migration-job
      resources:
        limits:
          cpu: '1'
          memory: 100Mi
        requests:
          cpu: 100m
          memory: 50Mi

cnfrancis · November 22, 2024, 1:56am

I saw in the project repo that it is set as a variable: awx-operator/roles/installer/templates/jobs/migration.yaml.j2 at 9718424483347c0ddd94b5eb6eff88d54226028a · ansible/awx-operator · GitHub

however, my task_resource_requirements is defined as

      resources:
        limits:
          cpu: '4'
          memory: 4Gi
        requests:
          cpu: '1'
          memory: 1Gi

hmm doesn’t seem possible ? awx-operator/roles/installer/tasks/migrate_schema.yml at 9718424483347c0ddd94b5eb6eff88d54226028a · ansible/awx-operator · GitHub

@Denney-tech think you could point me in the right direction please?

cnfrancis · November 22, 2024, 3:19am

So i tried to bump the operator’s version to 2.19.1 to get the template file with the resources specified, however it seems the image was built incorrectly or the wrong image was published to the registry in quay.io ?

$ podman run -it --entrypoint "" <company-registry>.com/prod-docker/ansible/awx-operator:2.19.1 grep -A 20 'migration-job' 'roles/installer/templates/jobs/migration.yaml.j2'
        - name: "migration-job"
          image: '{{ _image }}'
          command:
            - awx-manage
            - migrate
            - --noinput
          volumeMounts:
            - name: {{ ansible_operator_meta.name }}-application-credentials
              mountPath: "/etc/tower/conf.d/credentials.py"
              subPath: credentials.py
              readOnly: true
            - name: "{{ secret_key_secret_name }}"
              mountPath: /etc/tower/SECRET_KEY
              subPath: SECRET_KEY
              readOnly: true
            - name: {{ ansible_operator_meta.name }}-settings
              mountPath: "/etc/tower/settings.py"
              subPath: settings.py
              readOnly: true
            {{ lookup("template", "common/volume_mounts/extra_settings_files.yaml.j2")  | indent(width=12) | trim }}
{% if bundle_ca_crt %}

Denney-tech · November 23, 2024, 3:54pm

First, the template file for migration jobs is only specifying resources for the init container, not the main container. So, you’re probably getting the correct image from quay.io, just not the resource settings you were expecting.

Secondly, I’ve never seen the migration job get OOM-killed. While this could be remedied with setting resource limits, I would suspect the underlying k8s platform has insufficient resources for the new AWX cluster. So, even if you manage to get the migration job to complete somehow without it getting OOM-killed, the rest of the AWX pods may get OOM-killed once they start up afterwards as well.

cnfrancis · November 23, 2024, 4:07pm

Got you, i was able to fix my problem by editing the container template migration file with resources limts/requests high enough so that they don’t get killed.

While this could be remedied with setting resource limits, I would suspect the underlying k8s platform has insufficient resources for the new AWX cluster

our k8s admins enforces a default limit when none are specified that was the reason why it was getting oom killed.

cnfrancis · November 23, 2024, 4:08pm

thanks for taking a peek!

Topic		Replies	Views
AWX-Operator: How to update container resource limits when its already UP? AWX Project awx	1	31	September 16, 2022
Define resources for Operator install? AWX Project awx , kubernetes , ee	2	33	May 11, 2022
AWX Installation Problems on a VM AWX Project awx , ubuntu , kubernetes	12	193	May 11, 2022
AWX errors after Deploying K8s single Node Cluster. AWX Project awx , kubernetes	4	20	August 11, 2022
Database migration fails when deploying on kubernetes AWX Project awx , kubernetes	1	19	April 16, 2019

How to increase resource requirements of the migration job for an AWX deployed via K8s

Related topics