Error in rabbitmq container of awx pod when trying to install awx v9.2.0 on kubernetes cluster

Hi All,

Hope you are doing well.

I have tried installing awx 9.2.0 version on kubernetes on single node(ubuntu machine) and got success.

Now when I are trying to install the same version on multi node(1-master 2=worker), the awx-o pod goes into crashloopbackoff state. It gets fails due to awx-rabbit container.

Below are the logs for rabbitmq container in the awx pod:

2020-03-28 14:21:15.306 [info] <0.235.0> Running boot step pre_boot defined by app rabbit
2020-03-28 14:21:15.306 [info] <0.235.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-03-28 14:21:15.307 [info] <0.235.0> Running boot step rabbit_alarm defined by app rabbit
2020-03-28 14:21:15.310 [info] <0.241.0> Memory high watermark set to 6418 MiB (6730439065 bytes) of 16046 MiB (16826097664 bytes) total
2020-03-28 14:21:15.313 [info] <0.243.0> Enabling free disk space monitoring
2020-03-28 14:21:15.313 [info] <0.243.0> Disk free limit set to 50MB
2020-03-28 14:21:15.316 [info] <0.235.0> Running boot step code_server_cache defined by app rabbit
2020-03-28 14:21:15.316 [info] <0.235.0> Running boot step file_handle_cache defined by app rabbit
2020-03-28 14:21:15.316 [info] <0.246.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-03-28 14:21:15.316 [info] <0.247.0> FHC read buffering: OFF
2020-03-28 14:21:15.316 [info] <0.247.0> FHC write buffering: ON
2020-03-28 14:21:15.316 [info] <0.235.0> Running boot step worker_pool defined by app rabbit
2020-03-28 14:21:15.316 [info] <0.236.0> Will use 8 processes for default worker pool
2020-03-28 14:21:15.316 [info] <0.236.0> Starting worker pool ‘worker_pool’ with 8 processes in it
2020-03-28 14:21:15.317 [info] <0.235.0> Running boot step database defined by app rabbit
2020-03-28 14:21:15.317 [info] <0.235.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@192.168.184.11 is empty. Assuming we need to join an existing cluster or initialise from scratch…
2020-03-28 14:21:15.317 [info] <0.235.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-03-28 14:21:15.317 [info] <0.235.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-03-28 14:21:15.317 [info] <0.235.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-03-28 14:21:15.317 [info] <0.235.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2020-03-28 14:21:17.569 [info] <0.235.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{“10.96.0.1”,443}},{inet,[inet],timeout}]}
2020-03-28 14:21:17.570 [error] <0.234.0> CRASH REPORT Process <0.234.0> with 0 neighbours exited with reason: no case clause matching {error,“{failed_connect,[{to_address,{"10.96.0.1",443}},{inet,[inet],timeout}]}”} in rabbit_mnesia:init_from_config/0 line 167 in application_master:init/4 line 138
2020-03-28 14:21:17.570 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,“{failed_connect,[{to_address,{kubernetes.default.svc”,443}},{inet,[inet],timeout}]}"} in rabbit_mnesia:init_from_config/0 line 167
{“Kernel pid terminated”,application_controller,“{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{case_clause,{error,"{failed_connect,[{to_address,{\“10.96.0.1\”,443}},{inet,[inet],timeout}]}"}},[{rabbit_mnesia,init_from_config,0,[{file,"src/rabbit_mnesia.erl"},{line,167}]},{rabbit_mnesia,init_with_lock,3,[{file,"src/rabbit_mnesia.erl"},{line,147}]},{rabbit_mnesia,init,0,[{file,"src/rabbit_mnesia.erl"},{line,114}]},{rabbit_boot_steps,‘-run_step/2-lc$^1/1-1-’,1,[{file,"src/rabbit_boot_steps.erl"},{line,55}]},{rabbit_boot_steps,run_step,2,[{file,"src/rabbit_boot_steps.erl"},{line,59}]},{rabbit_boot_steps,‘-run_boot_steps/1-lc$^0/1-0-’,1,[{file,"src/rabbit_boot_steps.erl"},{line,28}]},{rabbit_boot_steps,run_boot_steps,1,[{file,"src/rabbit_boot_steps.erl"},{line,29}]},{rabbit,start,2,[{file,"src/rabbit.erl"},{line,937}]}]}}}}}”}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{case_clause,{error,"{failed_connect,[{to_address,{"10.96.0.1",443}

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump…done

Kindly assist in getting this error resolved.

Thanks,
Sanchit

Can you please post the log from K8.

Kubectl events
Kubectl describe pod
Kubectl logs

Thanks.

kubectl get events ::

LAST SEEN TYPE REASON OBJECT MESSAGE
47m Normal Scheduled pod/ansible-tower-management Successfully assigned awx/ansible-tower-management to ubuntu-worker
47m Normal Pulled pod/ansible-tower-management Container image “ansible/awx_task:9.2.0” already present on machine
47m Normal Created pod/ansible-tower-management Created container ansible-tower-management
47m Normal Started pod/ansible-tower-management Started container ansible-tower-management
46m Normal Killing pod/ansible-tower-management Stopping container ansible-tower-management
36m Normal Scheduled pod/ansible-tower-management Successfully assigned awx/ansible-tower-management to ubuntu-worker
36m Normal Pulled pod/ansible-tower-management Container image “ansible/awx_task:9.2.0” already present on machine
36m Normal Created pod/ansible-tower-management Created container ansible-tower-management
36m Normal Started pod/ansible-tower-management Started container ansible-tower-management
36m Normal Killing pod/ansible-tower-management Stopping container ansible-tower-management
47m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
47m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
47m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
47m Normal Created pod/awx-0 Created container awx-web
47m Normal Started pod/awx-0 Started container awx-web
47m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
47m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
47m Normal Created pod/awx-0 Created container awx-celery
47m Normal Started pod/awx-0 Started container awx-celery
47m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
46m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
46m Normal Created pod/awx-0 Created container awx-rabbit
46m Normal Started pod/awx-0 Started container awx-rabbit
47m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
47m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
47m Normal Created pod/awx-0 Created container awx-memcached
47m Normal Started pod/awx-0 Started container awx-memcached
47m Warning Unhealthy pod/awx-0 Readiness probe errored: rpc error: code = Unknown desc = container not running (b2f64e5a0339a1dca0ffed37d9b9dca9e0781851c5ed19c85be862a87c808004)
46m Normal Killing pod/awx-0 Stopping container awx-web
46m Normal Killing pod/awx-0 Stopping container awx-rabbit
46m Normal Killing pod/awx-0 Stopping container awx-memcached
46m Normal Killing pod/awx-0 Stopping container awx-celery
46m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
46m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
46m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
46m Normal Created pod/awx-0 Created container awx-web
46m Normal Started pod/awx-0 Started container awx-web
46m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
46m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
46m Normal Created pod/awx-0 Created container awx-celery
46m Normal Started pod/awx-0 Started container awx-celery
45m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
45m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
45m Normal Created pod/awx-0 Created container awx-rabbit
46m Normal Started pod/awx-0 Started container awx-rabbit
46m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
46m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
46m Normal Created pod/awx-0 Created container awx-memcached
46m Normal Started pod/awx-0 Started container awx-memcached
41m Warning BackOff pod/awx-0 Back-off restarting failed container
36m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
36m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
36m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
36m Normal Created pod/awx-0 Created container awx-web
36m Normal Started pod/awx-0 Started container awx-web
36m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
36m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
36m Normal Created pod/awx-0 Created container awx-celery
36m Normal Started pod/awx-0 Started container awx-celery
36m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
36m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
36m Normal Created pod/awx-0 Created container awx-rabbit
36m Normal Started pod/awx-0 Started container awx-rabbit
36m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
36m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
36m Normal Created pod/awx-0 Created container awx-memcached
36m Normal Started pod/awx-0 Started container awx-memcached
35m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
35m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
35m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
35m Normal Created pod/awx-0 Created container awx-web
35m Normal Started pod/awx-0 Started container awx-web
35m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
35m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
35m Normal Created pod/awx-0 Created container awx-celery
35m Normal Started pod/awx-0 Started container awx-celery
35m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
34m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
34m Normal Created pod/awx-0 Created container awx-rabbit
34m Normal Started pod/awx-0 Started container awx-rabbit
35m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
35m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
35m Normal Created pod/awx-0 Created container awx-memcached
35m Normal Started pod/awx-0 Started container awx-memcached
30m Warning BackOff pod/awx-0 Back-off restarting failed container
30m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
30m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
30m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
30m Normal Created pod/awx-0 Created container awx-web
30m Normal Started pod/awx-0 Started container awx-web
30m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
30m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
30m Normal Created pod/awx-0 Created container awx-celery
30m Normal Started pod/awx-0 Started container awx-celery
29m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
29m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
29m Normal Created pod/awx-0 Created container awx-rabbit
29m Normal Started pod/awx-0 Started container awx-rabbit
29m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
29m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
29m Normal Created pod/awx-0 Created container awx-memcached
29m Normal Started pod/awx-0 Started container awx-memcached
25m Warning BackOff pod/awx-0 Back-off restarting failed container
19m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
19m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
19m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
19m Normal Created pod/awx-0 Created container awx-web
19m Normal Started pod/awx-0 Started container awx-web
19m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
19m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
19m Normal Created pod/awx-0 Created container awx-celery
19m Normal Started pod/awx-0 Started container awx-celery
19m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
19m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
19m Normal Created pod/awx-0 Created container awx-rabbit
19m Normal Started pod/awx-0 Started container awx-rabbit
19m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
19m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
19m Normal Created pod/awx-0 Created container awx-memcached
19m Normal Started pod/awx-0 Started container awx-memcached
19m Warning BackOff pod/awx-0 Back-off restarting failed container
16m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
16m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
15m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
15m Normal Created pod/awx-0 Created container awx-web
15m Normal Started pod/awx-0 Started container awx-web
15m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
15m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
15m Normal Created pod/awx-0 Created container awx-celery
15m Normal Started pod/awx-0 Started container awx-celery
15m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
15m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
15m Normal Created pod/awx-0 Created container awx-rabbit
15m Normal Started pod/awx-0 Started container awx-rabbit
15m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
15m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
15m Normal Created pod/awx-0 Created container awx-memcached
15m Normal Started pod/awx-0 Started container awx-memcached
15m Warning BackOff pod/awx-0 Back-off restarting failed container
10m Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
10m Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
10m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
10m Normal Created pod/awx-0 Created container awx-web
10m Normal Started pod/awx-0 Started container awx-web
10m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
10m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
10m Normal Created pod/awx-0 Created container awx-celery
10m Normal Started pod/awx-0 Started container awx-celery
10m Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
10m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
10m Normal Created pod/awx-0 Created container awx-rabbit
10m Normal Started pod/awx-0 Started container awx-rabbit
10m Normal Pulling pod/awx-0 Pulling image “memcached:latest”
10m Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
10m Normal Created pod/awx-0 Created container awx-memcached
10m Normal Started pod/awx-0 Started container awx-memcached
10m Warning BackOff pod/awx-0 Back-off restarting failed container
9m4s Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
9m3s Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
9m Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
9m Normal Created pod/awx-0 Created container awx-web
9m Normal Started pod/awx-0 Started container awx-web
9m Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
8m57s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
8m57s Normal Created pod/awx-0 Created container awx-celery
8m57s Normal Started pod/awx-0 Started container awx-celery
8m18s Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
8m15s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
8m15s Normal Created pod/awx-0 Created container awx-rabbit
8m43s Normal Started pod/awx-0 Started container awx-rabbit
8m54s Normal Pulling pod/awx-0 Pulling image “memcached:latest”
8m50s Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
8m50s Normal Created pod/awx-0 Created container awx-memcached
8m50s Normal Started pod/awx-0 Started container awx-memcached
8m32s Warning BackOff pod/awx-0 Back-off restarting failed container
4m16s Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
4m15s Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
4m12s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
4m12s Normal Created pod/awx-0 Created container awx-web
4m12s Normal Started pod/awx-0 Started container awx-web
4m12s Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
4m8s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
4m8s Normal Created pod/awx-0 Created container awx-celery
4m8s Normal Started pod/awx-0 Started container awx-celery
3m36s Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
3m33s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
3m33s Normal Created pod/awx-0 Created container awx-rabbit
3m33s Normal Started pod/awx-0 Started container awx-rabbit
4m4s Normal Pulling pod/awx-0 Pulling image “memcached:latest”
4m1s Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
4m1s Normal Created pod/awx-0 Created container awx-memcached
4m Normal Started pod/awx-0 Started container awx-memcached
3m45s Warning BackOff pod/awx-0 Back-off restarting failed container
3m14s Normal Scheduled pod/awx-0 Successfully assigned awx/awx-0 to ubuntu-worker
3m13s Normal Pulling pod/awx-0 Pulling image “ansible/awx_web:9.2.0”
3m8s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_web:9.2.0”
3m8s Normal Created pod/awx-0 Created container awx-web
3m8s Normal Started pod/awx-0 Started container awx-web
3m8s Normal Pulling pod/awx-0 Pulling image “ansible/awx_task:9.2.0”
3m5s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_task:9.2.0”
3m5s Normal Created pod/awx-0 Created container awx-celery
3m5s Normal Started pod/awx-0 Started container awx-celery
2m31s Normal Pulling pod/awx-0 Pulling image “ansible/awx_rabbitmq:3.7.21”
2m28s Normal Pulled pod/awx-0 Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
2m28s Normal Created pod/awx-0 Created container awx-rabbit
2m51s Normal Started pod/awx-0 Started container awx-rabbit
3m1s Normal Pulling pod/awx-0 Pulling image “memcached:latest”
2m57s Normal Pulled pod/awx-0 Successfully pulled image “memcached:latest”
2m57s Normal Created pod/awx-0 Created container awx-memcached
2m57s Normal Started pod/awx-0 Started container awx-memcached
2m43s Warning BackOff pod/awx-0 Back-off restarting failed container
48m Warning FailedScheduling pod/awx-postgresql-postgresql-0 error while running “VolumeBinding” filter plugin for pod “awx-postgresql-postgresql-0”: pod has unbound immediate PersistentVolumeClaims
48m Normal Scheduled pod/awx-postgresql-postgresql-0 Successfully assigned awx/awx-postgresql-postgresql-0 to ubuntu-worker
48m Normal Pulling pod/awx-postgresql-postgresql-0 Pulling image “docker.io/bitnami/minideb:stretch
48m Normal Pulled pod/awx-postgresql-postgresql-0 Successfully pulled image “docker.io/bitnami/minideb:stretch
48m Normal Created pod/awx-postgresql-postgresql-0 Created container init-chmod-data
48m Normal Started pod/awx-postgresql-postgresql-0 Started container init-chmod-data
48m Normal Pulled pod/awx-postgresql-postgresql-0 Container image “docker.io/bitnami/postgresql:11.6.0-debian-10-r5” already present on machine
48m Normal Created pod/awx-postgresql-postgresql-0 Created container awx-postgresql
48m Normal Started pod/awx-postgresql-postgresql-0 Started container awx-postgresql
48m Normal SuccessfulCreate statefulset/awx-postgresql-postgresql create Claim data-awx-postgresql-postgresql-0 Pod awx-postgresql-postgresql-0 in StatefulSet awx-postgresql-postgresql success
48m Normal SuccessfulCreate statefulset/awx-postgresql-postgresql create Pod awx-postgresql-postgresql-0 in StatefulSet awx-postgresql-postgresql successful
46m Normal SuccessfulCreate statefulset/awx create Pod awx-0 in StatefulSet awx successful
46m Normal SuccessfulDelete statefulset/awx delete Pod awx-0 in StatefulSet awx successful
36m Warning FailedCreate statefulset/awx create Pod awx-0 in StatefulSet awx failed error: The POST operation against Pod could not be completed at this time, please try again.
3m14s Normal SuccessfulCreate statefulset/awx create Pod awx-0 in StatefulSet awx successful
36m Normal SuccessfulDelete statefulset/awx delete Pod awx-0 in StatefulSet awx successful

kubectl describe pod awx-0 -n awx ::

Name: awx-0
Namespace: awx
Priority: 0
Node: ubuntu-worker/192.168.101.245
Start Time: Sun, 29 Mar 2020 04:31:14 +1100
Labels: app=awx
controller-revision-hash=awx-7cd4d674b8
name=awx-web-deploy
service=django
statefulset.kubernetes.io/pod-name=awx-0
Annotations: cni.projectcalico.org/podIP: 192.168.184.15/32
Status: Running
IP: 192.168.184.15
IPs:
IP: 192.168.184.15
Controlled By: StatefulSet/awx
Containers:
awx-web:
Container ID: docker://0a03d868b5d9c724f2c1dac4db5b351f4d1a7d84c95633f5333a89d51c503ef6
Image: ansible/awx_web:9.2.0
Image ID: docker-pullable://ansible/awx_web@sha256:57232e6820eb1bfad3c6910bd993e1a8c0d644c0c0fcfb75a25a0b51b3d6fdec
Port: 8052/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 29 Mar 2020 04:31:20 +1100
Ready: True
Restart Count: 0
Requests:
cpu: 500m
memory: 1Gi
Environment:
Mounts:
/etc/nginx/nginx.conf from awx-nginx-config (ro,path=“nginx.conf”)
/etc/tower/SECRET_KEY from awx-secret-key (ro,path=“SECRET_KEY”)
/etc/tower/conf.d/ from awx-application-credentials (ro)
/etc/tower/settings.py from awx-application-config (ro,path=“settings.py”)
/var/run/secrets/kubernetes.io/serviceaccount from awx-token-hj4g4 (ro)
awx-celery:
Container ID: docker://c260b524f32bb4de0df21934ce19a4c6ee669b4bab5a620cbd8b234024f39947
Image: ansible/awx_task:9.2.0
Image ID: docker-pullable://ansible/awx_task@sha256:6dd8b36faecd4a522ee8dbff8c24cefff4826ff7c91da2653c5bb28bab8c1e66
Port:
Host Port:
Command:
/usr/bin/launch_awx_task.sh
State: Running
Started: Sun, 29 Mar 2020 04:31:23 +1100
Ready: True
Restart Count: 0
Requests:
cpu: 1500m
memory: 2Gi
Environment:
AWX_SKIP_MIGRATIONS: 1
Mounts:
/etc/tower/SECRET_KEY from awx-secret-key (ro,path=“SECRET_KEY”)
/etc/tower/conf.d/ from awx-application-credentials (ro)
/etc/tower/settings.py from awx-application-config (ro,path=“settings.py”)
/var/run/secrets/kubernetes.io/serviceaccount from awx-token-hj4g4 (ro)
awx-rabbit:
Container ID: docker://419fb0e3cef70683216e8c1e704953aec8b2b288c0008ef091255514c60cee75
Image: ansible/awx_rabbitmq:3.7.21
Image ID: docker-pullable://ansible/awx_rabbitmq@sha256:fdc1c4cde8c7de4192ef981aef57908363ca2f6b8bf7e7aec24873024c972d03
Ports: 15672/TCP, 5672/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 29 Mar 2020 04:35:02 +1100
Finished: Sun, 29 Mar 2020 04:35:09 +1100
Ready: False
Restart Count: 5
Requests:
cpu: 500m
memory: 2Gi
Liveness: exec [/usr/local/bin/healthchecks/rabbit_health_node.py] delay=30s timeout=10s period=10s #success=1 #failure=3
Readiness: exec [/usr/local/bin/healthchecks/rabbit_health_node.py] delay=10s timeout=10s period=10s #success=1 #failure=3
Environment:
MY_POD_IP: (v1:status.podIP)
RABBITMQ_USE_LONGNAME: true
RABBITMQ_NODENAME: rabbit@$(MY_POD_IP)
RABBITMQ_ERLANG_COOKIE: <set to the key ‘rabbitmq_erlang_cookie’ in secret ‘awx-secrets’> Optional: false
K8S_SERVICE_NAME: rabbitmq
RABBITMQ_USER: awx
RABBITMQ_PASSWORD: <set to the key ‘rabbitmq_password’ in secret ‘awx-secrets’> Optional: false
Mounts:
/etc/rabbitmq from rabbitmq-config (rw)
/usr/local/bin/healthchecks from rabbitmq-healthchecks (rw)
/var/run/secrets/kubernetes.io/serviceaccount from awx-token-hj4g4 (ro)
awx-memcached:
Container ID: docker://88eb3ed91dce786a16d64ade1504f513f7c230f48af6a36c8e5c9be47946ecc6
Image: memcached:latest
Image ID: docker-pullable://memcached@sha256:6ce1e76815f51d9be35f86b3015604134ee781ce26acb3d695ce4388123a422e
Port:
Host Port:
State: Running
Started: Sun, 29 Mar 2020 04:31:31 +1100
Ready: True
Restart Count: 0
Requests:
cpu: 500m
memory: 1Gi
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from awx-token-hj4g4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
awx-application-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-config
Optional: false
awx-nginx-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-config
Optional: false
awx-application-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: awx-secrets
Optional: false
awx-secret-key:
Type: Secret (a volume populated by a Secret)
SecretName: awx-secrets
Optional: false
rabbitmq-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-config
Optional: false
rabbitmq-healthchecks:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: awx-healthchecks
Optional: false
awx-token-hj4g4:
Type: Secret (a volume populated by a Secret)
SecretName: awx-token-hj4g4
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 5m22s default-scheduler Successfully assigned awx/awx-0 to ubuntu-worker
Normal Pulling 5m21s kubelet, ubuntu-worker Pulling image “ansible/awx_web:9.2.0”
Normal Pulled 5m16s kubelet, ubuntu-worker Successfully pulled image “ansible/awx_web:9.2.0”
Normal Created 5m16s kubelet, ubuntu-worker Created container awx-web
Normal Started 5m16s kubelet, ubuntu-worker Started container awx-web
Normal Pulling 5m16s kubelet, ubuntu-worker Pulling image “ansible/awx_task:9.2.0”
Normal Pulled 5m13s kubelet, ubuntu-worker Successfully pulled image “ansible/awx_task:9.2.0”
Normal Created 5m13s kubelet, ubuntu-worker Created container awx-celery
Normal Started 5m13s kubelet, ubuntu-worker Started container awx-celery
Normal Pulling 5m9s kubelet, ubuntu-worker Pulling image “memcached:latest”
Normal Started 5m5s kubelet, ubuntu-worker Started container awx-memcached
Normal Created 5m5s kubelet, ubuntu-worker Created container awx-memcached
Normal Pulled 5m5s kubelet, ubuntu-worker Successfully pulled image “memcached:latest”
Normal Started 4m59s (x2 over 5m9s) kubelet, ubuntu-worker Started container awx-rabbit
Normal Pulling 4m39s (x3 over 5m13s) kubelet, ubuntu-worker Pulling image “ansible/awx_rabbitmq:3.7.21”
Normal Created 4m36s (x3 over 5m9s) kubelet, ubuntu-worker Created container awx-rabbit
Normal Pulled 4m36s (x3 over 5m10s) kubelet, ubuntu-worker Successfully pulled image “ansible/awx_rabbitmq:3.7.21”
Warning BackOff 7s (x24 over 4m52s) kubelet, ubuntu-worker Back-off restarting failed container

kubectl logs awx-0 awx-rabbit ::

2020-03-28 17:38:08.297 [info] <0.8.0> Feature flags: list of feature flags found:
2020-03-28 17:38:08.297 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-03-28 17:38:08.327 [info] <0.235.0>
Starting RabbitMQ 3.7.21 on Erlang 22.1.7
Copyright (C) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL. See https://www.rabbitmq.com/

## RabbitMQ 3.7.21. Copyright (C) 2007-2019 Pivotal Software, Inc.

########## Licensed under the MPL. See https://www.rabbitmq.com/

########## Logs:

Starting broker…
2020-03-28 17:38:08.327 [info] <0.235.0>
node : rabbit@192.168.184.15
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : at619UOZzsenF44tSK3ulA==
log(s) :
database dir : /var/lib/rabbitmq/mnesia/rabbit@192.168.184.15

2020-03-28 14:21:15.306 [info] <0.235.0> Running boot step pre_boot defined by app rabbit
2020-03-28 14:21:15.306 [info] <0.235.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-03-28 14:21:15.307 [info] <0.235.0> Running boot step rabbit_alarm defined by app rabbit
2020-03-28 14:21:15.310 [info] <0.241.0> Memory high watermark set to 6418 MiB (6730439065 bytes) of 16046 MiB (16826097664 bytes) total
2020-03-28 14:21:15.313 [info] <0.243.0> Enabling free disk space monitoring
2020-03-28 14:21:15.313 [info] <0.243.0> Disk free limit set to 50MB
2020-03-28 14:21:15.316 [info] <0.235.0> Running boot step code_server_cache defined by app rabbit
2020-03-28 14:21:15.316 [info] <0.235.0> Running boot step file_handle_cache defined by app rabbit
2020-03-28 14:21:15.316 [info] <0.246.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-03-28 14:21:15.316 [info] <0.247.0> FHC read buffering: OFF
2020-03-28 14:21:15.316 [info] <0.247.0> FHC write buffering: ON
2020-03-28 14:21:15.316 [info] <0.235.0> Running boot step worker_pool defined by app rabbit
2020-03-28 14:21:15.316 [info] <0.236.0> Will use 8 processes for default worker pool
2020-03-28 14:21:15.316 [info] <0.236.0> Starting worker pool ‘worker_pool’ with 8 processes in it
2020-03-28 14:21:15.317 [info] <0.235.0> Running boot step database defined by app rabbit
2020-03-28 14:21:15.317 [info] <0.235.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@192.168.184.11 is empty. Assuming we need to join an existing cluster or initialise from scratch…
2020-03-28 14:21:15.317 [info] <0.235.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-03-28 14:21:15.317 [info] <0.235.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-03-28 14:21:15.317 [info] <0.235.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-03-28 14:21:15.317 [info] <0.235.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2020-03-28 14:21:17.569 [info] <0.235.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{“10.96.0.1”,443}},{inet,[inet],timeout}]}
2020-03-28 14:21:17.570 [error] <0.234.0> CRASH REPORT Process <0.234.0> with 0 neighbours exited with reason: no case clause matching {error,“{failed_connect,[{to_address,{"10.96.0.1",443}},{inet,[inet],timeout}]}”} in rabbit_mnesia:init_from_config/0 line 167 in application_master:init/4 line 138
2020-03-28 14:21:17.570 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,“{failed_connect,[{to_address,{kubernetes.default.svc”,443}},{inet,[inet],timeout}]}"} in rabbit_mnesia:init_from_config/0 line 167
{“Kernel pid terminated”,application_controller,“{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{case_clause,{error,"{failed_connect,[{to_address,{\“10.96.0.1\”,443}},{inet,[inet],timeout}]}"}},[{rabbit_mnesia,init_from_config,0,[{file,"src/rabbit_mnesia.erl"},{line,167}]},{rabbit_mnesia,init_with_lock,3,[{file,"src/rabbit_mnesia.erl"},{line,147}]},{rabbit_mnesia,init,0,[{file,"src/rabbit_mnesia.erl"},{line,114}]},{rabbit_boot_steps,‘-run_step/2-lc$^1/1-1-’,1,[{file,"src/rabbit_boot_steps.erl"},{line,55}]},{rabbit_boot_steps,run_step,2,[{file,"src/rabbit_boot_steps.erl"},{line,59}]},{rabbit_boot_steps,‘-run_boot_steps/1-lc$^0/1-0-’,1,[{file,"src/rabbit_boot_steps.erl"},{line,28}]},{rabbit_boot_steps,run_boot_steps,1,[{file,"src/rabbit_boot_steps.erl"},{line,29}]},{rabbit,start,2,[{file,"src/rabbit.erl"},{line,937}]}]}}}}}”}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{case_clause,{error,"{failed_connect,[{to_address,{"10.96.0.1",443}

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump…done

Kindly share the below output

kubectl get all -n awx
kubectl -n awx get pods -o wide

I think you are using pgsql within the kubernetes instead of external. Did you follow the steps for installing pgsql in kubernetes in the github install page?

https://github.com/ansible/awx/blob/devel/INSTALL.md

also please share:

cat inventory |grep -i -v “^#”

Hi Selvam,

The problem is not related to postgresql…postgresql pod is already in running state. The problem is in the rabbit container(awx-rabbit) of awx-0 pod which is terminating again n again thereby causing the pod to go into CrashLoopBackOff state.

kubectl get all -n awx ::

NAME READY STATUS RESTARTS AGE
pod/awx-0 3/4 CrashLoopBackOff 3 2m2s
pod/awx-postgresql-postgresql-0 1/1 Running 0 4m8s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-postgresql ClusterIP 10.104.27.52 5432/TCP 4m8s
service/awx-postgresql-headless ClusterIP None 5432/TCP 4m8s
service/awx-rmq-mgmt ClusterIP 10.96.131.35 15672/TCP 3m5s
service/awx-web-svc NodePort 10.111.218.39 80:32299/TCP 3m5s
service/rabbitmq NodePort 10.98.186.33 15672:31962/TCP,5672:31225/TCP 3m5s

NAME READY AGE
statefulset.apps/awx 0/1 3m5s
statefulset.apps/awx-postgresql-postgresql 1/1 4m8s

kubectl -n awx get pods -o wide::

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
awx-0 3/4 CrashLoopBackOff 3 2m45s 192.168.184.19 ubuntu-worker
awx-postgresql-postgresql-0 1/1 Running 0 4m51s 192.168.184.17 ubuntu-worker

cat inventory |grep -i -v “^#” ::

dockerhub_base=ansible
kubernetes_context=kubeawx
kubernetes_namespace=awx
pg_persistence_storageClass=standard
awx_task_hostname=awx
awx_web_hostname=awxweb
postgres_data_dir=“~/.awx/pgdocker”
host_port=80
host_port_ssl=443
docker_compose_dir=“~/.awx/awxcompose”
pg_username=awx
pg_password=awxpass
pg_database=awx
pg_port=5432
rabbitmq_password=awxpass
rabbitmq_erlang_cookie=cookiemonster
admin_user=admin
admin_password=password
create_preload_data=True
secret_key=awxsecret

Thanks,
Sanchit

Thanks.

Please remove the below or put # front of the line then try to install it again.

host_port_ssl=443 I do see some connection error while trying to reach 443. Hence please put #
docker_compose_dir=“~/.awx/awxcompose” – This is not required in kubernetes install. put #

remove all the awx pods and services then try it again with rerunning ansible install playbook.

Thanks.

Hi Selvam,

We have tried implementing the below changes/suggestions but no success.

Thanks,
Sanchit

Hi All,

We are also not able to install AWX on Kubernetes (multinode). Tried with version 9.3.0 first, however, there are some issues with it. Like not able to sync the Git repo. Considering it is a major release, went with 9.1.0 however it got stuck at Migrate database task.

Then tried to install 9.2.0, it is stuck at RabbitMQ.Giving below error

Failed to get nodes from k8s - {failed_connect,[{to_address,{“kubernetes.default”,443}}, {inet,[inet],timeout}]}

Can you please help? It looks like the issue with cluster_formation.peer_discovery_backend settings, something is missing here. Has anyone installed 9.1.0 or 9.2.0 on Kubernetes (multi-node)?

Regards,
Ajit

I did but with external postgres.

Is it giving same error. Can you please send me the logs.