Data Tagging preview and testing

Background

Ansible Core 2.19 is slated to receive the Data Tagging feature we’ve been talking about and developing for several years. It’s been a long process, culminating in the largest set of changes to ansible-core since collections, including a significant overhaul of the templating engine and many other changes we’re hoping you’ll love. We’ve spent a lot of time validating countless backward compatibility scenarios, but a few of the security and templating changes may require adjustments to playbooks.

We’ve reached a point where we’re looking for community feedback and testing against pre-releases of ansible-core 2.19.

How You Can Help

Right now, we’re hoping to get feedback around how well existing playbooks and content work with these new features. We’re still working on some plugin API changes and compatibility corner cases, as well as some new ways to more formally define the boundaries of those APIs.

The feature PR has merged and is available on ansible-core >= 2.19.0b1, with continued stabilization leading up to the 2.19 release of ansible-core.

Getting Help

Draft documentation for the new features can be found in the 2.19 porting guide, including detailed descriptions of the changes and example situations where content changes may be required.

Reporting Issues

Issues specific to these changes should be filed with the Pre-release Bug Report form.

Getting Started

There are two ways to test data tagging currently.

Installing Using Pip

Install a current ansible-core pre-release from PyPI:

$ pip install --upgrade --pre 'ansible-core~=2.19.0b1'
...
$ ansible --version
ansible [core 2.19.0b1]
...

If the installation was successful, the version output should reflect a pre-release build of 2.19.

Building Execution Environments

AAP/AWX Execution Environments can also be created with 2.19 pre-releases via ansible-builder 3.0+ using this method; use ansible-core~=2.19.0b1 for the ansible_core → package_pip entry in the definition file and ensure that the Python interpreter is at least Python 3.1. For example:

version: 3
images:
  base_image:
    name: registry.fedoraproject.org/fedora:41
dependencies:
  ansible_core:
    package_pip: ansible-core~=2.19.0b1
  ansible_runner:
    package_pip: ansible-runner
  python_interpreter:
    package_system: python3
8 Likes

I would like to know if installing ansible-core devel branch directly from git would also make use of the data tagging feature?

I am asking this because most DevTools projects do already have a testing pipeline job that uses ‘devel’ branch of ansible-core.

Once the PR linked above is merged to devel, yes.

Right now specifying refs/pull/84621/head instead of devel for the ansible-core branch to use works fine in most CI scripts I’ve seen.

(“Works fine” in the sense of that it then runs against the Data Tagging PR. Not in the sense of that everything is just passing :wink: )

2 Likes

We’ve got a separate discussion around using the new functionality offered by Data Tagging

Here’s a list of changes we’ve had to make to collections so far:

If people DM me links to other PRs I’ll update this list

The above community.general PR is only part of the story; I moved a lot of changes from there out to separate PRs that could already be merged (fixing things that aren’t DT specific that were found by DT, or in some cases working around issues with DT that also work fine with existing releases). This information can also be found by looking at the rebases and the PRs linked, but here’s a complete list:

(community.general was the biggest chunk of work, but I had to modify every single collection I maintain. It also resulted in a stream of issues in ansible-core :wink: )

Yep, some of which have already resulted in tweaks to eliminate or minimize those changes… Thanks again for all the early testing and engagement! It’s much easier for us to make those kinds of improvements before the code ships than once it’s been released for a long time.

I’m trying out if this also works for Zuul jobs. At least for community.vmware. And I think it does.

Maybe someone else who knows more about how Zuul works than I do could have another look at this. But I see Collecting jinja2>=3.0.0 (from ansible-core==2.19.0.dev0) in the [DNM] Test data tagging logs for the integration tests, whereas it says Collecting jinja2>=3.0.0 (from ansible-core==2.18.3.post0) in a PR without this change to Zuul. So I guess the integration tests have been really run with data tagging in place.

However, I’d like to repeat that I’m not 100% sure. This is just an educated guess.

edit: I’ve just added refs/pull/84621/head to the sanity and unit tests. This fails, but I’m not sure yet if this has something to do with data tagging or if I run into another problem with 2.19.

Regarding Zuul CI, there is also

1 Like

Hi,

I have been a bit busy and out of the loop but did want to take a look at what this meant from the perspective of ara, to see if it broke anything or meant new opportunities.

My full notes and screenshots so far are here but I am giving you a condensed update here.

Everything seems to work with the current latest fallible (2025.3.11). That’s a relief :slight_smile:

ara’s integration tests are passing just the same and I must say I like the newer exception, deprecation and error messages, both in stdout and from a callback perspective, being in separate fields. Nice.

I have tried to measure the performance improvement claim:

Most nested/recursive templating operations are now fully lazy, vastly improving performance in complex scenarios.

From a very naive benchmark that does a debug 10000 times on 100 hosts the performance is about the same:

I would be willing to believe it does not exercise that code path but I would be curious to try something that does.
Do you have an example “nested/recursive templating” somewhere ?

I have tried to find if something like that existed in the integration test targets from the PR but it’s a bit hard to navigate.

2 Likes

My guess is that it means something like

- hosts: localhost
  gather_facts: false
  tasks:
    - ansible.builtin.debug:
        msg: "{{ var }}"
      vars:
        var: "{{ sub }} {{ sub }} {{ sub2 }} {{ sub2 }} {{ sub3 }} {{ sub3 }} {{ sub4 }} {{ sub4 }}"
        sub: ">{{ sub2 }} {{ sub2 }} {{ sub3 }} {{ sub3 }}<"
        sub2: "[{{ sub3 }} {{ sub3 }} {{ sub4 }}]"
        sub3: "({{ sub4 }} {{ sub4 }})"
        sub4: "{{ 42 | ansible.builtin.random }}"

But I cannot see differences between the output in devel, stable-2.18, and the DT branch.

I meant to reply with this earlier but I guess I forgot. This is a simplified example, and doesn’t really showcase some of the more complex recursive examples, but there is an improvement in speed for both due to the same “lazier” optimization:

- hosts: localhost
  gather_facts: false
  vars:
    test:
      slow: '{{ lookup("pipe", "sleep 5") }}'
      fast: 'fast'
  tasks:
    - debug:
        msg: '{{ test.fast }}'

Something like this is more in alignment with the comment, but is unnecessary in reality to showcase the change:

- hosts: localhost
  gather_facts: false
  vars:
    slow:
      static: slow
      _slow: '{{ lookup("pipe", "sleep 5") }}'
    test:
      test: '{{ slow.static }}'
  tasks:
    - debug:
        msg: '{{ test.test }}'

Historically the way templating worked, largely due to AnsibleJ2Vars, was to do templating immediately on full data structures the moment that a template was found. This is what allows nested templates to work in ansible-core that do not work in jinja2 with it’s builtin functionality.

This doesn’t “just work” in jinja2:

Environment().from_string('{{ foo }}').render(foo='{{ bar }}', bar='baz')

You end up with {{ bar }}.

As a result, any “slow” paths in the dependent data structure were templated early, and potentially repeatedly. The lazier eval in DT is delayed until needed, and only the paths that are needed, instead of full dependent templating.

1 Like

I’ve run the playbook (with dozens of tasks using modules from several collections, although mainly community.vmware) we use to configure our environment today with fallible. Only one problem there, with a when:.

Looks more like a broken conditional on our side (expected because of your changes) than a bug.

I’ll have to investigate this further. Anyway, until I open a bug report count this as a “works for me” :slightly_smiling_face:

1 Like

I’ve somehow lost my ability to update this post to reflect current suggested testing steps against the 2.19.0 beta instead of fallible and the shortcut to the pre-release issue template - working on getting that sorted. Meantime: replace fallible[compat] with ansible==2.19.0b1 in the instructions above.

Hello, maybe its the wrong place for it, but what is data tagging can you please explain it to me? I searched for it but found nothing to it. Thanks

@Jonny You can find info about data tagging in the PR description at: Templating overhaul, implement Data Tagging by nitzmahone · Pull Request #84621 · ansible/ansible · GitHub

The PR preview build for the 2.19 porting guide also explains a lot: Ansible-core 2.19 Porting Guide — Ansible Core Documentation

Thank you! I have also found both links… Unfortunately I don’t understand it :melting_face:

Hello, I’m a bit like Jonny, personnaly as a simple playbook author but not a module developer, what is the impact for me ? besides, I see the ‘deprecated’ tag example, but we have seen for a long time these kind of warnings (but about deprecated modules parameters for example) when running playbooks, so it’s not clear yet what it changes for us :slight_smile:

Data tagging is mostly revamp of the templating system.

For playbook authors the main features would be:

  • Better error information, details should be much clearer and precise
  • Stronger data types, templating now does a much better job preserving the data types, less numbers converted to strings.
  • Conditionals are now more strictly evaluated, previouslly some things passed as true/false when they should have been templating errors.
  • More secure templating, Ansible switched from ‘trusted by default’ to ‘untrusted by default’, this also allows us to remove a lot of processing/filtering/tagging we did to prevent vulnerabilities.

For developers, they get a lot of the above, but DT also signals the start of a new way of writing the internal APIs, clearly signaling what is internal (_internal prefixed/subdir) and documenting and making explicit which APIs are available for which plugins.

4 Likes