I got a kind-of-working proof-of-concept working this morning. It was actually fairly quick, but not especially obvious how the parts fit together:
1) build a custom EE with the following additional python modules:
opentelemetry\-api
opentelemetry\-exporter\-otlp
opentelemetry\-sdk
and make sure it also has the community.general collection in requirements.yml (that's where the callback comes from)
2) Add an ansible.cfg to your project repo(s) with the following content:
\[defaults\]
collections\_paths = \./collections
callbacks\_enabled = community\.general\.opentelemetry
\[callback\_opentelemetry\]
otel\_service\_name = awx
enable\_from\_environment = ANSIBLE\_OPENTELEMETRY\_ENABLED
hide\_task\_arguments = yes
3) In the AWX Job Settings, add some new environment variables:
{
"ANSIBLE_OPENTELEMETRY_ENABLED": "true",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://my-opentelemetry-collector:4317",
"OTEL_EXPORTER_OTLP_INSECURE": "true"
}
(only insecure because it's in the same namespace and not reachable externally)
4) Set up your new EE and apply it to job templates that you want to have traces for.
5) $$$ ? Well nearly. I've pasted an example trace below. You get the ansible module, start/end times, and some other info. You would get the task parameters, but I've specifically disabled those since ours can include secrets and I don't want those in the APM database.
What you don't get is the awx job ID, job template ID or playbook filename or anything that tells you what the task is part of. I may try to add that myself. Our initial use-case is to get min/avg/max times per task to find slow spots, so actually tracing the playbook isn't so important yet.
Here is the opentelemetry-collector dump of one task in a playbook, in case anyone is interested. All tasks in the same playbook have the same Parent ID:
Span #2
Trace ID : 9ef582def41e1f0cd5fcff19a6627e81
Parent ID : 3c231f011ec1d3c8
ID : 14f3ee90573170dd
Name : Sleep a while
Kind : SPAN_KIND_INTERNAL
Start time : 2022-10-26 09:27:14.886296194 +0000 UTC
End time : 2022-10-26 09:27:44.897898494 +0000 UTC
Status code : STATUS_CODE_OK
Status message :
Attributes:
-> ansible.task.module: STRING(pause)
-> ansible.task.message: STRING(success)
-> ansible.task.name: STRING([localhost] localhost: Sleep a while)
-> ansible.task.result: INT(0)
-> ansible.task.host.name: STRING(localhost)
-> ansible.task.host.status: STRING(ok)
Events:
SpanEvent #0
-> Name: {"changed": false, "delta": 30, "echo": true, "rc": 0, "start": "2022-10-26 09:27:14.894602", "stderr": "", "stdout": "Paused for 30.0 seconds", "stop": "2022-10-26 09:27:44.894791", "user_input": ""}
-> Timestamp: 2022-10-26 09:27:44.980227528 +0000 UTC
-> DroppedAttributesCount: 0