Rabbitmq configuration in AWX deployment on K8s Cluster

Hello Team,

Looking for some solutions/clarity on the rabbitmq configuration of AWX deployed on Kubernetes cluster.

With the default rabbitmq configuration, the rabbitmq container fails to deploy. But if I comment out “cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s”, the container gets deployed but the rabbitmq cluster is not formed.

So my question is, will the rabbitmq cluster get configured if we have multiple instances of awx (scaled) or will they be standalone containers. If it is standalone, then what is the use/purpose of above line.

apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
namespace: {{ kubernetes_namespace }}
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s].
rabbitmq_definitions.json: |
{
“users”:[{“name”: “{{ rabbitmq_user }}”, “password”: “{{ rabbitmq_password }}”, “tags”: “administrator”}],
“permissions”:[
{“user”:“{{ rabbitmq_user }}”,“vhost”:“awx”,“configure”:“.“,“write”:”.”,“read”:“.“}
],
“vhosts”:[{“name”:“awx”}],
“policies”:[
{“vhost”:“awx”,“name”:“ha-all”,“pattern”:”.
”,“definition”:{“ha-mode”:“all”,“ha-sync-mode”:“automatic”}}
]
}
rabbitmq.conf: |

Clustering

management.load_definitions = /etc/rabbitmq/rabbitmq_definitions.json
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc
cluster_formation.k8s.address_type = ip

cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = false
cluster_partition_handling = autoheal

queue master locator

queue_master_locator=min-masters

enable guest user

loopback_users.guest = false
log.file.level = debug

kubectl logs awx-0 -n awx awx-rabbit

## RabbitMQ 3.7.15. Copyright (C) 2007-2019 Pivotal Software, Inc.

########## Licensed under the MPL. See https://www.rabbitmq.com/

########## Logs:

Starting broker…
2020-01-08 12:20:05.474 [info] <0.221.0>
Starting RabbitMQ 3.7.15 on Erlang 22.0.5
Copyright (C) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL. See https://www.rabbitmq.com/
2020-01-08 12:20:05.480 [info] <0.221.0>
node : rabbit@10.140.3.115
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : at619UOZzsenF44tSK3ulA==
log(s) :
database dir : /var/lib/rabbitmq/mnesia/rabbit@10.140.3.115
2020-01-08 12:20:07.381 [info] <0.221.0> Running boot step pre_boot defined by app rabbit
2020-01-08 12:20:07.381 [info] <0.221.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-01-08 12:20:07.382 [info] <0.221.0> Running boot step rabbit_alarm defined by app rabbit
2020-01-08 12:20:07.386 [info] <0.229.0> Memory high watermark set to 3128 MiB (3280291430 bytes) of 7820 MiB (8200728576 bytes) total
2020-01-08 12:20:07.390 [info] <0.231.0> Enabling free disk space monitoring
2020-01-08 12:20:07.390 [info] <0.231.0> Disk free limit set to 50MB
2020-01-08 12:20:07.393 [info] <0.221.0> Running boot step code_server_cache defined by app rabbit
2020-01-08 12:20:07.393 [info] <0.221.0> Running boot step file_handle_cache defined by app rabbit
2020-01-08 12:20:07.393 [info] <0.234.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-01-08 12:20:07.393 [info] <0.235.0> FHC read buffering: OFF
2020-01-08 12:20:07.393 [info] <0.235.0> FHC write buffering: ON
2020-01-08 12:20:07.394 [info] <0.221.0> Running boot step worker_pool defined by app rabbit
2020-01-08 12:20:07.394 [info] <0.221.0> Running boot step database defined by app rabbit
2020-01-08 12:20:07.394 [info] <0.221.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@10.140.3.115 is empty. Assuming we need to join an existing cluster or initialise from scratch…
2020-01-08 12:20:07.394 [info] <0.221.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-01-08 12:20:07.394 [info] <0.221.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-01-08 12:20:07.394 [info] <0.221.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-01-08 12:20:07.394 [info] <0.221.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2020-01-08 12:20:07.395 [error] <0.220.0> CRASH REPORT Process <0.220.0> with 0 neighbours exited with reason: no match of right hand value {error,eacces} in rabbit_peer_discovery_k8s:make_request/0 line 109 in application_master:init/4 line 138
2020-01-08 12:20:07.396 [info] <0.43.0> Application rabbit exited with reason: no match of right hand value {error,eacces} in rabbit_peer_discovery_k8s:make_request/0 line 109
{“Kernel pid terminated”,application_controller,“{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{badmatch,{error,eacces}},[{rabbit_peer_discovery_k8s,make_request,0,[{file,"src/rabbit_peer_discovery_k8s.erl"},{line,109}]},{rabbit_peer_discovery_k8s,list_nodes,0,[{file,"src/rabbit_peer_discovery_k8s.erl"},{line,55}]},{rabbit_peer_discovery,discover_cluster_nodes,0,[{file,"src/rabbit_peer_discovery.erl"},{line,120}]},{rabbit_mnesia,init_from_config,0,[{file,"src/rabbit_mnesia.erl"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,"src/rabbit_mnesia.erl"},{line,144}]},{rabbit_mnesia,init,0,[{file,"src/rabbit_mnesia.erl"},{line,111}]},{rabbit_boot_steps,‘-run_step/2-lc$^1/1-1-’,1,[{file,"src/rabbit_boot_steps.erl"},{line,55}]},{rabbit_boot_steps,run_step,2,[{file,"src/rabbit_boot_steps.erl"},{line,52}]}]}}}}}”}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{badmatch,{error,eacces}},[{rabbit_peer_discovery_k8s,make_request,0,

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump…done

The rabbitmq containers will cluster, yes.

What do you mean by “the rabbitmq container fails to deploy”. I presume the error message you pasted its the case when you remove the peer discovery backend.

-Chris

Thanks for the response, Chris.

With the default configuration (peer discovery enabled), the rabbitmq container fails to start/run and goes in to CrashLoopBackOff.

# grep cluster roles/kubernetes/templates/deployment.yml.j2
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc
cluster_formation.k8s.address_type = ip
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = false
cluster_partition_handling = autoheal

# kubectl get pods -n awx
NAME READY STATUS RESTARTS AGE
awx-0 3/4 CrashLoopBackOff 6 7m2s

# kubectl logs awx-0 -n awx awx-rabbit

## RabbitMQ 3.7.15. Copyright (C) 2007-2019 Pivotal Software, Inc.

########## Licensed under the MPL. See https://www.rabbitmq.com/

########## Logs:

Starting broker…
2020-01-08 15:19:04.045 [info] <0.221.0>
Starting RabbitMQ 3.7.15 on Erlang 22.0.5
Copyright (C) 2007-2019 Pivotal Software, Inc.
Licensed under the MPL. See https://www.rabbitmq.com/
2020-01-08 15:19:04.065 [info] <0.221.0>
node : rabbit@10.140.3.121
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : at619UOZzsenF44tSK3ulA==
log(s) :
database dir : /var/lib/rabbitmq/mnesia/rabbit@10.140.3.121
2020-01-08 15:19:05.757 [info] <0.221.0> Running boot step pre_boot defined by app rabbit
2020-01-08 15:19:05.757 [info] <0.221.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-01-08 15:19:05.758 [info] <0.221.0> Running boot step rabbit_alarm defined by app rabbit
2020-01-08 15:19:05.768 [info] <0.229.0> Memory high watermark set to 3128 MiB (3280291430 bytes) of 7820 MiB (8200728576 bytes) total
2020-01-08 15:19:05.773 [info] <0.231.0> Enabling free disk space monitoring
2020-01-08 15:19:05.773 [info] <0.231.0> Disk free limit set to 50MB
2020-01-08 15:19:05.776 [info] <0.221.0> Running boot step code_server_cache defined by app rabbit
2020-01-08 15:19:05.776 [info] <0.221.0> Running boot step file_handle_cache defined by app rabbit
2020-01-08 15:19:05.776 [info] <0.234.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-01-08 15:19:05.776 [info] <0.235.0> FHC read buffering: OFF
2020-01-08 15:19:05.776 [info] <0.235.0> FHC write buffering: ON
2020-01-08 15:19:05.777 [info] <0.221.0> Running boot step worker_pool defined by app rabbit
2020-01-08 15:19:05.777 [info] <0.221.0> Running boot step database defined by app rabbit
2020-01-08 15:19:05.777 [info] <0.221.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@10.140.3.121 is empty. Assuming we need to join an existing cluster or initialise from scratch…
2020-01-08 15:19:05.777 [info] <0.221.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-01-08 15:19:05.777 [info] <0.221.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-01-08 15:19:05.777 [info] <0.221.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-01-08 15:19:05.778 [info] <0.221.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2020-01-08 15:19:05.779 [error] <0.220.0> CRASH REPORT Process <0.220.0> with 0 neighbours exited with reason: no match of right hand value {error,eacces} in rabbit_peer_discovery_k8s:make_request/0 line 109 in application_master:init/4 line 138
2020-01-08 15:19:05.779 [info] <0.43.0> Application rabbit exited with reason: no match of right hand value {error,eacces} in rabbit_peer_discovery_k8s:make_request/0 line 109
{“Kernel pid terminated”,application_controller,“{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{badmatch,{error,eacces}},[{rabbit_peer_discovery_k8s,make_request,0,[{file,"src/rabbit_peer_discovery_k8s.erl"},{line,109}]},{rabbit_peer_discovery_k8s,list_nodes,0,[{file,"src/rabbit_peer_discovery_k8s.erl"},{line,55}]},{rabbit_peer_discovery,discover_cluster_nodes,0,[{file,"src/rabbit_peer_discovery.erl"},{line,120}]},{rabbit_mnesia,init_from_config,0,[{file,"src/rabbit_mnesia.erl"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,"src/rabbit_mnesia.erl"},{line,144}]},{rabbit_mnesia,init,0,[{file,"src/rabbit_mnesia.erl"},{line,111}]},{rabbit_boot_steps,‘-run_step/2-lc$^1/1-1-’,1,[{file,"src/rabbit_boot_steps.erl"},{line,55}]},{rabbit_boot_steps,run_step,2,[{file,"src/rabbit_boot_steps.erl"},{line,52}]}]}}}}}”}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,]},{‘EXIT’,{{badmatch,{error,eacces}},[{rabbit_peer_discovery_k8s,make_request,0,

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump…done

Inventory file:

grep rabbit inventory

These are the request and limit values for a pod’s container for task/web/rabbitmq/memcached/management.

rabbitmq_cpu_limit=1500
rabbitmq_mem_limit=4
rabbitmq_user=awx
rabbitmq_password=************
rabbitmq_erlang_cookie=cookiemonster

Am I doing something wrong, could you please help in identifying/fixing the issue. Are there any other logs/details which might help in diagnosing the problem.

Appreciate your response.

Regards,Vibin

Hello Chris,

This issue has been resolved.

We have SELINUX enabled in our environment and that was causing the cluster formation to fail due to the restrictions on the container.

Removed the restrictions on the container by making it privileged fixed the issue. Now everything looks and works fine.

  • name: {{ kubernetes_deployment_name }}-rabbit
    securityContext:
    privileged: true

Regards,
Vibin