Monitoring for AWX

Currently, I’m using AWX in K8s with grafana, prometheus. I started this topic to discuss about golden metrics of AWX, so that we we can easily to monitor the status of AWX.

First, you can watch this youtube video https://www.youtube.com/watch?v=JiVE8gc8a7I&ab_channel=AnsibleAWXCommunity. I learn a lot from it.

demo dashboard from awx repository awx/tools/grafana/dashboards/demo_dashboard.json at devel · ansible/awx · GitHub

2 Likes

Metric to watch: awx_database_connections_total
Number of connections are connected to database

  1. prom expr: sum by (container) (awx_database_connections_total)

absent(awx_database_connections_total)


These exrs help us to detect if we are having any db connection issue. If absent return 1 it means your postgres is having problem.

  1. expr: callback_receiver_event_processing_avg_seconds
    Expain: How much lag time between when event occurred in Ansible and when user could see it

  2. expr: avg(task_manager__schedule_seconds)
    Average time task manager schedule jobs

  3. expr: probe_http_status_code{target=~"status-awx-.*"}
    Monitor awx web UI return code. We can use blackbox exporter to expose this metrics

  4. Monitor volume space usage

  5. expr: awx_system_info
    Version

image

  1. Monitor cpu, memory, network, io.
1 Like