AWX Installation Problems on a VM

I’m trying install AWX on various hardware and I’m running into a weird issue. Full disclosure though, this is my first foray into anything Kubernetes or AWX so I am having a little trouble understanding where to look or how to isolate the problem.

The short version is: Installing on physical hardware (eg. desktop) with Red Hat or Ubuntu works as expected. Trying to install on a VM running under Microsoft Hyper-V fails every time as described below. Any pointers on where to look for what is causing this issue would be greatly appreciated. It should be noted that virtualization applications have been run on our Hyper-V VMs before (eg. the Docker version of AWX 17.1.0 or Vagrant).

At the point where the awx-demo.yaml file (using the instructions at https://github.com/ansible/awx-operator) is deployed using kustomize (or a similar step using other instructions available around the Internet), things get stuck in the same state. This has happened using minkube or k3s for the Kubernetes layer.

$ kubectl get pods -n awx -l “app.kubernetes.io/managed-by=awx-operator
NAME READY STATUS RESTARTS AGE
awx-postgres-0 1/1 Running 0 52m
awx-c6855d75d-9mtb5 2/4 CrashLoopBackOff 14 (4m52s ago) 52m

The only error message I can find comes from:

$ kubectl -n awx logs deployments/awx-operator-controller-manager -c awx-manager

{“level”:“error”,“ts”:1651672235.1357942,“logger”:“logging_event_handler”,“msg”:“”,“name”:“awx”,“namespace”:“awx”,“gvk”:“awx.ansible.com/v1beta1, Kind=AWX”,“event_type”:“runner_on_failed”,“job”:“8997747498569746568”,“EventData.Task”:“Apply deployment resources”,“EventData.TaskArgs”:“”,“EventData.FailedTaskPath”:“/opt/ansible/roles/installer/tasks/resources_configuration.yml:75”,“error”:“[playbook task failed]”}

Thanks very much in advance for any direction that people can offer.

Actually I found one more entry in the logs that might give a clue:

"message": "Deployment does not have minimum availability.", "reason": "MinimumReplicasUnavailable", "status": "False", "type": "Available"}, {"lastTransitionTime": "2022-05-04T13:58:35Z", "lastUpdateTime": "2022-05-04T13:58:35Z", "message": "ReplicaSet \"awx-c6855d75d\" has timed out progressing.", "reason": "ProgressDeadlineExceeded", "status": "False", "type": "Progressing"}], "observedGeneration": 1, "replicas": 1, "unavailableReplicas": 1, "updatedReplicas": 1

Hi!

wondering if we’re hitting resource limit related issues

What does “kubectl describe pod awx-c6855d75d-9mtb5” show?

Seth

The following is not for the situation above but for a new installation I tried. It is failing exactly the same way but the above was with k3s and I wanted to use minikube as per the AWX docs. Minkube was started as per the docs with ‘minikube start --cpus=4 --memory=6g --addons=ingress’.

Name: awx-demo-7db56584fb-tz2jc
Namespace: awx
Priority: 0
Node: minikube/192.168.39.159
Start Time: Wed, 04 May 2022 16:06:34 -0400
Labels: app.kubernetes.io/component=awx
app.kubernetes.io/managed-by=awx-operator
app.kubernetes.io/name=awx-demo
app.kubernetes.io/part-of=awx-demo
app.kubernetes.io/version=21.0.0
pod-template-hash=7db56584fb
Annotations:
Status: Running
IP: 172.17.0.6
IPs:
IP: 172.17.0.6
Controlled By: ReplicaSet/awx-demo-7db56584fb
Containers:
redis:
Container ID: docker://2d6b775d940e2c9c26aedec2bc8639c4b14c4e2c9f80a2997f37208fa4127bf9
Image: docker.io/redis:latest
Image ID: docker-pullable://redis@sha256:96c3e4dfe047ba9225a7d36fc92b5a5cff9e047daf41a1e0122e2bd8174c839e
Port:
Host Port:
Args:
redis-server
/etc/redis.conf
State: Running
Started: Wed, 04 May 2022 16:06:38 -0400
Ready: True
Restart Count: 0
Environment:
Mounts:
/data from awx-demo-redis-data (rw)
/etc/redis.conf from awx-demo-redis-config (ro,path=“redis.conf”)
/var/run/redis from awx-demo-redis-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k5k5k (ro)
awx-demo-web:
Container ID: docker://4886b7f7ffcaf0ce518654855bdecbece35f7689e9d6517320c27020786e6caf
Image: quay.io/ansible/awx:21.0.0
Image ID: docker-pullable://quay.io/ansible/awx@sha256:916bb21bb87586090dcfa86c2edf7d08cdde2d47e642c4a642fba6c9ee68dbfe
Port: 8052/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 127
Started: Wed, 04 May 2022 16:11:07 -0400
Finished: Wed, 04 May 2022 16:11:07 -0400
Ready: False
Restart Count: 5
Requests:
cpu: 100m
memory: 128Mi
Environment:
MY_POD_NAMESPACE: awx (v1:metadata.namespace)
UWSGI_MOUNT_PATH: /
Mounts:
/etc/nginx/nginx.conf from awx-demo-nginx-conf (ro,path=“nginx.conf”)
/etc/tower/SECRET_KEY from awx-demo-secret-key (ro,path=“SECRET_KEY”)
/etc/tower/conf.d/credentials.py from awx-demo-application-credentials (ro,path=“credentials.py”)
/etc/tower/conf.d/execution_environments.py from awx-demo-application-credentials (ro,path=“execution_environments.py”)
/etc/tower/conf.d/ldap.py from awx-demo-application-credentials (ro,path=“ldap.py”)
/etc/tower/settings.py from awx-demo-settings (ro,path=“settings.py”)
/var/lib/awx/projects from awx-demo-projects (rw)
/var/lib/awx/rsyslog from rsyslog-dir (rw)
/var/run/awx-rsyslog from rsyslog-socket (rw)
/var/run/redis from awx-demo-redis-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k5k5k (ro)
/var/run/supervisor from supervisor-socket (rw)
awx-demo-task:
Container ID: docker://431a4acaeb3b3fb4280c3dd2266eb991df4f40afabf6504689a92ebca4dea19e
Image: quay.io/ansible/awx:21.0.0
Image ID: docker-pullable://quay.io/ansible/awx@sha256:916bb21bb87586090dcfa86c2edf7d08cdde2d47e642c4a642fba6c9ee68dbfe
Port:
Host Port:
Args:
/usr/bin/launch_awx_task.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 127
Started: Wed, 04 May 2022 16:11:07 -0400
Finished: Wed, 04 May 2022 16:11:07 -0400
Ready: False
Restart Count: 5
Requests:
cpu: 100m
memory: 128Mi
Environment:
SUPERVISOR_WEB_CONFIG_PATH: /etc/supervisord.conf
AWX_SKIP_MIGRATIONS: 1
MY_POD_UID: (v1:metadata.uid)
MY_POD_IP: (v1:status.podIP)
MY_POD_NAMESPACE: awx (v1:metadata.namespace)
Mounts:
/etc/receptor/receptor.conf from awx-demo-receptor-config (ro,path=“receptor.conf”)
/etc/tower/SECRET_KEY from awx-demo-secret-key (ro,path=“SECRET_KEY”)
/etc/tower/conf.d/credentials.py from awx-demo-application-credentials (ro,path=“credentials.py”)
/etc/tower/conf.d/execution_environments.py from awx-demo-application-credentials (ro,path=“execution_environments.py”)
/etc/tower/conf.d/ldap.py from awx-demo-application-credentials (ro,path=“ldap.py”)
/etc/tower/settings.py from awx-demo-settings (ro,path=“settings.py”)
/var/lib/awx/projects from awx-demo-projects (rw)
/var/lib/awx/rsyslog from rsyslog-dir (rw)
/var/run/awx-rsyslog from rsyslog-socket (rw)
/var/run/receptor from receptor-socket (rw)
/var/run/redis from awx-demo-redis-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k5k5k (ro)
/var/run/supervisor from supervisor-socket (rw)
awx-demo-ee:
Container ID: docker://2db8d2cd96716a545d61b1a014f4499ee80b85d3b23a55c347e47d332a58c5fb
Image: quay.io/ansible/awx-ee:latest
Image ID: docker-pullable://quay.io/ansible/awx-ee@sha256:f45a0263a9fbeddbd89bcdc6775b6f3a8fbcaf1d3ac06c9451bddc5cbed62134
Port:
Host Port:
Args:
receptor
–config
/etc/receptor/receptor.conf
State: Running
Started: Wed, 04 May 2022 16:08:13 -0400
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 64Mi
Environment:
Mounts:
/etc/receptor/receptor.conf from awx-demo-receptor-config (ro,path=“receptor.conf”)
/var/lib/awx/projects from awx-demo-projects (rw)
/var/run/receptor from receptor-socket (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k5k5k (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
awx-demo-application-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: awx-demo-app-credentials
Optional: false
awx-demo-secret-key:
Type: Secret (a volume populated by a Secret)
SecretName: awx-demo-secret-key
Optional: false
awx-demo-settings:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-demo-awx-configmap
Optional: false
awx-demo-nginx-conf:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-demo-awx-configmap
Optional: false
awx-demo-redis-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-demo-awx-configmap
Optional: false
awx-demo-redis-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-demo-redis-data:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
supervisor-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
rsyslog-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
receptor-socket:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
rsyslog-dir:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
awx-demo-receptor-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-demo-awx-configmap
Optional: false
awx-demo-projects:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
kube-api-access-k5k5k:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Normal Scheduled 4m42s default-scheduler Successfully assigned awx/awx-demo-7db56584fb-tz2jc to minikube
Normal Pulling 4m41s kubelet Pulling image “docker.io/redis:latest
Normal Pulled 4m38s kubelet Successfully pulled image “docker.io/redis:latest” in 2.910747512s
Normal Started 4m38s kubelet Started container redis
Normal Pulling 4m38s kubelet Pulling image “quay.io/ansible/awx:21.0.0
Normal Created 4m38s kubelet Created container redis
Normal Pulled 4m kubelet Successfully pulled image “quay.io/ansible/awx:21.0.0” in 37.635252845s
Normal Pulling 3m58s kubelet Pulling image “quay.io/ansible/awx-ee:latest
Normal Pulled 3m5s kubelet Successfully pulled image “quay.io/ansible/awx-ee:latest” in 53.74577208s
Normal Created 3m4s kubelet Created container awx-demo-ee
Normal Pulled 3m3s kubelet Container image “quay.io/ansible/awx:21.0.0” already present on machine
Normal Started 3m3s kubelet Started container awx-demo-ee
Normal Pulled 3m2s (x2 over 3m59s) kubelet Container image “quay.io/ansible/awx:21.0.0” already present on machine
Normal Started 3m2s (x2 over 3m58s) kubelet Started container awx-demo-task
Normal Created 3m2s (x2 over 3m59s) kubelet Created container awx-demo-task
Normal Started 3m2s (x2 over 3m59s) kubelet Started container awx-demo-web
Normal Created 3m2s (x2 over 3m59s) kubelet Created container awx-demo-web
Warning BackOff 3m (x2 over 3m1s) kubelet Back-off restarting failed container
Warning BackOff 3m (x2 over 3m1s) kubelet Back-off restarting failed container

I would double the minikube RAM for starters. I think 6Gb is probably just about the absolute lower limit these days, and in your case not enough.

I set these values, which works fine.

minikube config view

  • cpus: 4
  • memory: 12g
  • profile: minikube

Phil.

Not so sure. I have awx running and installed a host with 8 GB of memory that is shared with other applications. I experienced the error that there was minimum availability but I just used things like

postgres_resource_requirements :{}
web_resource_requirements:{}
task_resource_requirements:{}

In your custom yaml file. It’s balking because the default values for requests that has been set exceeds your system resources. So we just force its hand and tell it that there are no requests.

K3s is not minikube. It’s minimum spec is 512mb ram not 2gb like minikube.

Yes you can take that approach as well, but I would recommend giving it more overall RAM in the first place, rather than trying to restrict smaller pod runtime config, which could push the problems else where.

That isn’t ‘restricting’ container resources. It is merely saying don’t reserve the cpu/memory up front, rather use the resources as needed. Limit keyword in the custom k8s is limiting the resources , that particular directive we are using is requests, which means reserve the resource needed.

Depending on the op requirements it could be adequate depending on the population of hosts. It’s just a matter of experimentation.

Sorry yes I wasn’t clear. We set recommended resource limits on the pods for reasons. By allowing it unbound access, you can create other problems. You’re using 8Gb which is probably just about enough, but the more you can give it, the better.

Doesn’t look like you are using limits based on the documentation.

https://github.com/ansible/awx-operator

Example :
web_resource_requirements Web container resource requirements requests: {cpu: 100m, memory: 128Mi}

There is no limit that you speak of

Thanks everyone for the suggestions so far but this does not seem to be a consequence of resources (near as I can tell). I have increased the VM (running on Microsoft Hyper-V) resources to 8 CPUs and 16GB memory, started minikube with ‘minikube start --cpus=6 --memory=12g --addons=ingress’ and even added

postgres_resource_requirements :{}
web_resource_requirements:{}
task_resource_requirements:{}

to awx-demo-.yaml as someone suggested. Same problem exists. Something crashes trying to start the 3rd thing (ie. 2/4 seem to get running) for the awx service and the only messages in the log that seem to point to anything amiss are:

{“level”:“error”,“ts”:1651672235.1357942,“logger”:“logging_event_handler”,“msg”:“”,“name”:“awx”,“namespace”:“awx”,“gvk”:“awx.ansible.com/v1beta1, Kind=AWX”,“event_type”:“runner_on_failed”,“job”:“8997747498569746568”,“EventData.Task”:“Apply deployment resources”,“EventData.TaskArgs”:“”,“EventData.FailedTaskPath”:“/opt/ansible/roles/installer/tasks/resources_configuration.yml:75”,“error”:“[playbook task failed]”}

and

…“message”: “Deployment does not have minimum availability.”, “reason”: “MinimumReplicasUnavailable”, “status”: “False”, “type”: “Available”}, {“lastTransitionTime”: “2022-05-05T16:47:18Z”, “lastUpdateTime”: “2022-05-05T16:47:18Z”, “message”: “ReplicaSet "awx-demo-7db56584fb" is progressing.”, “reason”: “ReplicaSetUpdated”, “status”: “True”, “type”: “Progressing”}], “observedGeneration”: 1, “replicas”: 1, “unavailableReplicas”: 1, “updatedReplicas”: 1}}}

Hoping that maybe someone will have some clue what might be wrong or where I can look to see what is crashing.

Thanks
Brad

Hi Brad,

Which version of the operator are you attempting to deploy? Does your problem still exist for the latest? 0.21.0?

Seth

Turns out the problem was with the VM environment itself. In Microsoft Hyper-V, there is a processor setting in the VM Properties called “Allow migration to a virtual machine with a different processor version”. Although this normally didn’t cause problems with “virtualization within virtualization” before, it did in this case. I found this after trying to create a Docker version of AWX and it failed part way through with the error that the processor doesn’t support “x86-64-v2”.

Disabling that Hyper-V option and everything works as expected.