Data tagging playground

In case you heard about data tagging in the past and are wondering when we’ll finally have it, there are some news regarding that: There is now a public PR in ansible/ansible with the current WIP implementation: [WIP] Templating overhaul, implement Data Tagging by nitzmahone · Pull Request #84621 · ansible/ansible · GitHub

This is highly experimental and far from done (as far as I understand), and will take some time to get completed. It’s also a very massive change, and there are still quite a few known issues (see for example the changelog fragments included in the PR). For that reason, please refrain from commenting in that PR unless absolutely necessary - it’s easier to ask somewhere else first whether the comment is appropriate (like in this thread, generally in the forum, or on Matrix or IRC).

Anyway, I want to start this thread with a little test module that uses data tagging to mark a return value as deprecated:

#!/usr/bin/python
from ansible.module_utils.basic import AnsibleModule
from ansible.module_utils.datatag import AnsibleTagHelper
from ansible.module_utils.datatag.tags import Deprecated

def main():
    m = AnsibleModule(argument_spec=dict())
    m.exit_json(
        a='normal result',
        b=AnsibleTagHelper.tag('deprecated result', Deprecated(msg="Yo, this is deprecated!", removal_version="2.3.4")),
    )

if __name__ == '__main__':
    main()

When registering the result and using the deprecated return value, you get:

TASK [Output result B] ***********************************************************************************************************************************************************************
[WARNING]: Deprecation warnings can be disabled by setting `deprecation_warnings=False` in ansible.cfg.
[DEPRECATION WARNING]: Yo, this is deprecated! This feature will be removed in version 2.3.4.
Origin: /path/to/playbook.yml:13:14

11     - name: Output result B
12       debug:
13         msg: "{{ result.b }}"
                ^ column 14

ok: [localhost] => {
    "msg": "deprecated result"
}

If you don’t use the deprecated return value, no deprecation message will be shown. (See my gist for a playbook and full result.)

Please note that right now the Deprecated tag does not allow to specify the collection’s name (as opposed to the module.deprecated() call). I’m sure this will get added (it should be pretty simple to add actually, the hard work is all the other functionality and machinery tagging needs).

It should also be pretty simple to add a module utils to a collection that does try/catch around the data tagging imports to provide some convenience functionality to your collection to tag data if the user uses ansible-core 2.19+, and to not use it or older verisons. That will allow collections to use this feature once it’s released, without having to wait until all ansible-core versions supported by your collections have it (which tends to need a few years longer).

Please use this thread to discuss this, share ideas, tests, etc.!

3 Likes

One observation: module parameters seem to be never tagged. (At least the Deprecated tag isn’t send in.)

You can test whether a variable x is tagged with Deprecated with Deprecated in AnsibleTagHelper.tag_types(x).

If you’re interested in the details of the Deprecated tag, you can do something like [(tag.msg, tag.removal_version, tag.removal_date) for tag in AnsibleTagHelper.tags(x) if isinstance(tag, Deprecated)].

In plugins you have access to more tags; see ansible.utils.datatag.tags for details. Currently there are AnsibleSourcePosition, VaultedValue, TrustedAsTemplate, NotATemplate, and _EncryptedSource (that one is for internal use only).

I tried to submit a PR for ansible core with the deps feature we use in community.general and I was told that it should wait for data tagging. Any idea of how dependency management is going to be impacted by data tagging? I am not munch that PR with 800+ files just to find out.

In fact, applying agile mindset here, a PR that big is quite a risk. Why is a change that big being pushed forward? Wouldn’t it be easier to manage a number of smaller changes rather than one big one? Just wondering.

It likely isn’t, but if you look at how many parts of ansible-core are touched by the data tagging PR, I think they want to merge only absolutely neccessary things until data tagging is merged. Apparently now they already sometimes have to spend days rebasing to resolve conflicts when something is merged to devel.

In fact, applying agile mindset here, a PR that big is quite a risk. Why is a change that big being pushed forward? Wouldn’t it be easier to manage a number of smaller changes rather than one big one? Just wondering.

No idea, I’ve wondered about that as well… A guess of mine is that development of this feature took so many exploration that at some point it was easier to create a gigantic PR instead of trying to split it up. But :person_shrugging:

(Data tagging has been announced for several years now, and always got delayed. I think I heard first mentions of the idea for it 5-6 years ago, back when everything was still in ansible/ansible. Context back then was deprecation of a return value of a module.)

Well, that is frustrating to some degree - it tells me that Ansible development is largely frozen until this mammoth hatches out of its egg. :person_shrugging:

ansible-core :slight_smile: Ansible is more than ansible-core… But yeah, that definitely seems to be the case.

If you’re curious how communication from the module to controller looks like: it’s still JSON, and using special elements to communicate:

{
  "a": "normal result",
  "b": {
    "value": "deprecated result",
    "tags": [
      {
        "msg": "Yo, this is deprecated!",
        "removal_version": "2.3.4",
        "__ansible_type": "Deprecated"
      }
    ],
    "__ansible_type": "_AnsibleTaggedStr"
  },
  "invocation": {
    "module_args": {}
  }
}

So basically if you get a dict with a __ansible_type key, then this dict encodes a value that can have tags, and you have to decode it according to what __ansible_type’s value is (in many cases, it’s just taking what’s in value, like in this example). The protocol also allows to transport some Python objects like dates, datetimes, or times via JSON.

This information also allows non-Python modules to return tags, or receive tags (if that will ever happen - I hope there will be some no_log tag that’s passed to modules, but :person_shrugging:).

1 Like

See Data Tagging preview and testing.

This does indeed look interesting. Looking through the PR I didn’t find any other (useful) tags that could be used. I was hoping for a tag to mark a value as sensitive to prevent it from ever showing up in logs – is something like this planned?

I would hope so, but so far: no idea… Right now the public API doesn’t even allow to query whether a tag is set, you can only add tags. So far there’s only “trusted” (on controller) and “deprecated” (controller and modules).

My guess is (still) that the core team first wants to get the feature out before starting to add more tags. But I don’t know if anything is planned so far… Maybe @nitzmahone has some insights to share? :slight_smile:

Yes, a SensitiveData tag was part of the original PoC that I demoed at a contrib summit a few years ago, but with the huge swath of Ansible’s surface area that the feature touches (and the raft of other changes it exposed a need for), we had to drop that one from the initial release.

I really hope we’ll have time to get back to it, because IMO it’s the most compelling use case for data tagging. It’s not hard to make it work in the happy path, but (as with most things in Ansible due to its “organically grown” nature) there are lots of weird corner cases where it can’t work securely/correctly/quickly without more significant rework of the guts.

Our concern with keeping it in the initial release was that, given the inherent security implications of such a feature, it’ll be an endless source of data disclosure CVEs if we can’t work most of those problems out before it ships. Since data tagging’s template trust model inversion just got rid of one of those, you can probably understand our reticence to write a new one.

This project has also been delayed way too many times already- we really want folks to start reaping some of the other benefits! Plus, if you haven’t had the pleasure of regularly rebasing a > 850 file divergent branch of ansible-core for over two years, well, you just don’t know what a good time looks like :wink: .

1 Like

This project has also been delayed way too many times already- we really want folks to start reaping some of the other benefits!

Makes sense, I was mainly curious. I think updating the existing preview forum post to include a bit more information about potential planned tags might help to gather more interest as well.

Plus, if you haven’t had the pleasure of regularly rebasing a > 850 file divergent branch of ansible-core for over two years, well, you just don’t know what a good time looks like :wink: .

Well, obviously not for ansible-core but I had my fair share of running large side-branches for multiple years in other projects. Nothing that I plan to do anytime again anytime soon :smiley:

The interface changed BTW compared to the example in my original post. On the controller side (plugins), you can use

from ansible.template import trust_as_template, is_trusted_as_template

to mark strings as trusted (my_string_variable = trust_as_template(my_string_variable)) or test whether a string variable is trusted (if is_trusted_as_template(my_string_variable): ...).

In module_utils there’s

from ansible.module_utils.datatag import deprecate_value, native_type_name

where deprecate_value allows to deprecate return values (module.exit_json(value=deprecate_value(value, "This value is deprecated", removal_version="2.0.0"))) and native_type_name allows to simplify type names (type(my_variable) can have funky names like _AnsibleTaggedStr, native_type_name(my_variable) or native_type_name(type(my_variable)) will give back things like str).