opentelemetry callback with awx?

,

Hi,

We’re already running opentelemetry collector and Elastic APM for other purposes, and I noticed that Ansible has an opentelemetry callback (https://docs.ansible.com/ansible/latest/collections/community/general/opentelemetry_callback.html) and it’d be handy to get metrics on how our playbooks are running (in particular how quickly various external APIs are responding)

It has module dependencies though. Has anyone implemented this? I assume I just need to add them to our custom EE?

Thanks for any insight.

Howard

Hello,
You should be able to achieve this based on what you have described. There are multiple ways this could be achieved. The Ansible Runner Repo should have discussion around this.

Please let us know if you are able to get this working. We would be very curious as to the steps you took and feel that this would be helpful to other users.

AWX Team

I got a kind-of-working proof-of-concept working this morning. It was actually fairly quick, but not especially obvious how the parts fit together:

1) build a custom EE with the following additional python modules:

 opentelemetry\-api
 opentelemetry\-exporter\-otlp
 opentelemetry\-sdk

and make sure it also has the community.general collection in requirements.yml (that's where the callback comes from)

2) Add an ansible.cfg to your project repo(s) with the following content:

 \[defaults\]
 collections\_paths = \./collections
 callbacks\_enabled = community\.general\.opentelemetry

 \[callback\_opentelemetry\]
 otel\_service\_name = awx
 enable\_from\_environment = ANSIBLE\_OPENTELEMETRY\_ENABLED
 hide\_task\_arguments = yes

3) In the AWX Job Settings, add some new environment variables:

{
"ANSIBLE_OPENTELEMETRY_ENABLED": "true",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://my-opentelemetry-collector:4317",
"OTEL_EXPORTER_OTLP_INSECURE": "true"
}

(only insecure because it's in the same namespace and not reachable externally)

4) Set up your new EE and apply it to job templates that you want to have traces for.

5) $$$ ? Well nearly. I've pasted an example trace below. You get the ansible module, start/end times, and some other info. You would get the task parameters, but I've specifically disabled those since ours can include secrets and I don't want those in the APM database.

What you don't get is the awx job ID, job template ID or playbook filename or anything that tells you what the task is part of. I may try to add that myself. Our initial use-case is to get min/avg/max times per task to find slow spots, so actually tracing the playbook isn't so important yet.

Here is the opentelemetry-collector dump of one task in a playbook, in case anyone is interested. All tasks in the same playbook have the same Parent ID:

Span #2
Trace ID : 9ef582def41e1f0cd5fcff19a6627e81
Parent ID : 3c231f011ec1d3c8
ID : 14f3ee90573170dd
Name : Sleep a while
Kind : SPAN_KIND_INTERNAL
Start time : 2022-10-26 09:27:14.886296194 +0000 UTC
End time : 2022-10-26 09:27:44.897898494 +0000 UTC
Status code : STATUS_CODE_OK
Status message :
Attributes:
-> ansible.task.module: STRING(pause)
-> ansible.task.message: STRING(success)
-> ansible.task.name: STRING([localhost] localhost: Sleep a while)
-> ansible.task.result: INT(0)
-> ansible.task.host.name: STRING(localhost)
-> ansible.task.host.status: STRING(ok)
Events:
SpanEvent #0
-> Name: {"changed": false, "delta": 30, "echo": true, "rc": 0, "start": "2022-10-26 09:27:14.894602", "stderr": "", "stdout": "Paused for 30.0 seconds", "stop": "2022-10-26 09:27:44.894791", "user_input": ""}
-> Timestamp: 2022-10-26 09:27:44.980227528 +0000 UTC
-> DroppedAttributesCount: 0

Awesome, thanks for sharing and outlining these steps.

AWX Team

Hello,

First of all, thank you for your poc of building a custom ee image.

However, I’m meeting an issue when running it (ansible stdout job) :

[WARNING]: Skipping callback ‘community.general.opentelemetry’, unable to load

due to: The opentelemetry-api, opentelemetry-exporter-otlp or

opentelemetry-sdk must be installed to use this plugin

It seems that my image does not take into account all the modifications such as the additions of the python libs even when building with the ansible builder, pushing the new image to my registry and running it in my ee.

docker run -it --rm ansible-execution-env:latest sh

sh-4.4# pip3 list | grep opentelemetry

opentelemetry-api 1.13.0

opentelemetry-exporter-otlp 1.13.0

opentelemetry-exporter-otlp-proto-grpc 1.13.0

opentelemetry-exporter-otlp-proto-http 1.13.0

opentelemetry-proto 1.13.0

opentelemetry-sdk 1.13.0

opentelemetry-semantic-conventions 0.34b0

Could you be more specific about your build, my requirements.yml looks like this :