Core-2.19 templating changes - preview and testing

:waving_hand:

I will echo bcoca and say that I doubt flowerysong meant to come off as rude but I see how it could have been interpreted that way.

I cannot speak for every change or feature but if nothing else I recently looked into the performance improvement claims: Core-2.19 templating changes - preview and testing - #33 by rfc2549

The improvement will depend on how playbooks are written and the scale at which they run but there seem to be very tangible performance gains.

About what this change brings for end-users: it makes templating more logical and removes random strange things that are happening in some cases (but not in others) and are both confusing and require strange workarounds. Unfortunately this also breaks some things folks have been doing for a long time, and breaks some (ab-)uses of these implicit “magic”.

For example, sometimes Ansible will parse string variables that contain content that looks like a Python data structure, like some JSON. This can be really annoying and is hard to work around, but there are also playbooks which “just worked” because they implicitly used these conversions. Example: Improve tests by felixfontein · Pull Request #227 · ansible-collections/community.sops · GitHub

(I had another example involving unwanted JSON conversion, but that’s in a private repository in a company I no longer work at, so I can’t share it.)

The new features also pointed out quite a few bugs in playbooks/roles (especially when not using ansible-lint). My “favorite” ones are accidental dictionaries like in Fix/improve tests by felixfontein · Pull Request #9859 · ansible-collections/community.general · GitHub.

Overall I think these changes are really great both for end-users, content developers, and plugin/module developers - but unfortunately they break some existing content. But they will ensure that new content has less hacks, workarounds, and bugs, and especially new users have a much smoother experience when templating. (Also old-time users when writing new stuff if you don’t like a pattern that only worked because of some of these quirks.)

3 Likes

(I personally would have called it ansible-core 3.0.0 as well, but I have no influence on that decision, and by now I guess that ship has sailed…)

3 Likes

Well, I’m doing quite some Ansible over the years, and I kind of haven’t seen any really unexpected behaviors with templating per say.

We witnessed performance regressions where Ansible was resolving all the chain each task which was leading to a very slow execution, and that indeed was unobvios and hard to debug. But I’m not sure if there’s concencus if it has improved or not after all…

And while I could compare execution time of CI, we have not started the work of adoption 2.19, as too much dependencies are broken at this point, but also fixing all our roles and some modules would take a while as well.

The only extremely annoying example I know, and I am not actually sure if it was fixed or not, if when you have to pass a real integer to the module via variable. As example below fails, as module does receive a string due to the need to have quotes around the variable:

- name: Get info about VLAN
  vars:
    vlan: 100
  maas.maas.vlan_info:
    cluster_instance: "eth0"
    fabric_name: "fabric-0"
    vid: "{{ vlan | int }}"
  register: _present_vlan

But if I won’t use variable and just place an int - it would work:

- name: Get info about VLAN
  maas.maas.vlan_info:
    cluster_instance: "eth0"
    fabric_name: "fabric-0"
    vid: 100
  register: _present_vlan

But I am not really sure if this was the thing which was fixed in 2.19 either…

And about 3.0 I totally think it must be 3.0 or at the bare minimum, 2.18 should be made a some kind of LTS like 2.9 was.

Though 2.10 had waaaaay less breaking changes (almost none?) comparing to 2.19 which is shattering all ecosystem which was working with very minimal (almost cosmetic) changes for almost a decade (2.0 was released in 2016) apart with them.
And doing a major release once in a decade is also quite acceptable…

1 Like

it is more likely we move to callver, as most other projects in the ecosystem are doing, than releasing a 3.0, which can also get confused with the Ansible community package … people still ask me about why there are two 2.10s and what ‘ansible-base’ is (my fault, I put it as a place holder and people ran with it …).

1 Like

Um, can you please name some examples of projects in ecosystem using calver?

As calver can be a viable option for Ansible Platform versioning, but not ansible-core, imo. Unless the end goal is to make ansible-core unusable for regular audience without the platform subscription.

Here are a some, not going to do all:

I’m unclear how changing the versioning scheme makes subscriptions required , afaik, people have been using Ubuntu on calver for years w/o paying them 1c. Also the above have been used with calver w/o issues for some time now.

1 Like

It seems to be popular among several projects on Ansible Collaborative - Ansible ecosystem, like ansible-lint, ansible-dev-environment, ansible-dev-tools, ansible-creator, molecule, ansible-navigator, tox-ansible, vscode-ansible.

Why is that?

2 Likes

Ah, ok, I guess I was thinking of calver more of OpenStack-way (ie YYYY.mm). And that somehow feels way more cryptic versioning, as all projects inside openstack still use just semver when being published to PyPi.

And thus, I based my commend that usually in calver there’s not really room for backports and treated initiative as constantly rolling releases (Like CentOS Stream, rather then Ubuntu with point releases to calver).

So if there would be no backport capabilities and no “stable” branches - it would make core kinda unusable.

But now looking at examples and how they are organized - I agree that this does not really matter, as core approach would be kept intact.

So thanks for pointing me to examples and I agree that my previous comment was wrong in light of this.

3 Likes

@flowerysong
I ain’t reading all that

My post is long, I know about that : the original was much longer, and was trimmed down.
Up to you to restrict yourself to the leaf and missing the tree and the forest behind.



@bcoca

Thanks for your answer.
You’re right with the precision for the removal of using tests as filters and limited only to tests in v2.5 : I had to trim the original message, and went overboard on this topic.

For the shell/command warn option, to be on the right page here, I don’t disagree about why it was removed, most of the time the option had to be present in the tasks, so it made little sense in the end.
Where I disagree is how the removal was applied : by raising a blocking/critical error when the “warn” option was still remaining in the code of the tasks and roles, and halting the execution immediately.
It should have Instead been ignored, either silently or by printing a warning message, and moving over.

About ansible_managed, I’ve read multiple stances on it blurring the real take here.
There is also nothing about it on the v2.19 devel porting guide.
If you could add a note about it in the guide, it would be helpful.

As for the new trust model, it is not that I care or not : I do, but in due time.
The main problem is the reality about upgrading ansible-core version. Having multiple controllers means crawling from the controller managing the dev and staging environments, up to the controller for the production. Multiply this by datacenters (or per country, if not both).
That were the compatibility between the old and the new Ansible’s versions will be the most decisive point.

The less the compatibility, the more the delay will increase between each controller to be updated, production’s being the last one.
Meanwhile, the playbooks, roles, and others stuff might not be able to stay frozen very long. Thus raising a problem when the versions are not compatible.

It is not to disregard the importance of Ansible internal security, but it is a tool which have literally the power of life and death on the servers, appliances, environments, and anything else it will connect to.
You can even revert the security configuration of most of the targets in a single pass to allow a basic user with full rights able to connect with a blank or insecure password.
At this point, the subject of trust is outside Ansible (of course, aside the possibilities to intercept Ansible when running on the target nodes).
It falls on how to secure the access to Ansible and its resources (the controllers, inventories, roles, playbooks, git server, …). It is more a question of human processes than anything.

I felt on ansible-core v2.19 changes a little by accident. I usually skim over it after the release is out, if not even the .1 fixing the last bugs. This is uncommon in comparison to the colleagues who start looking only when it’s time to upgrade.
So months or a year later.
RHEL7 is not supported anymore but still around, and RHEL8 is still in maintenance support up to 2029. For managing both, ansible-core v2.16 is the last version with python 3.6 support and also python 2.7 as a bonus for target nodes (the reference is the out-of-the-box version without installing anything else, so python 3.6 for RHEL8).
It is highly possible the bulk of the reports about 2.19 changes will only start to appear later, when you’ll be working on ansible-core v2.24 or around.
Much too late for anything.

This said, I will spend time this WE testing thoroughly this.

@noonedeadpunk

Related to the templating performances, if you are doing some heavy parsing of hostvars[], there’s a long-time trick if you don’t already know about it.
Copy hostvars to a variable and parse said variable only.
It is due to hostvars containing the original definitions from the inventory, and not the final values. Thus being calculated at each call, even in the template module.

Something like :

{% for loopHost in my_role_inventory_list_of_hosts |default([]) |flatten |sort %}
{%   set hostvarsCache = hostvars[loopHost] %}

[{{ loopHost }}]
param1 = {{ hostvarsCache.property1 }}
param2 = {{ hostvarsCache.property2 }}
(...)
paramN = {{ hostvarsCache.propertyN }}

{% endfor %}

It doesn’t change much for a few calls with a single node, but when it is about a single template centralizing the information of multiple nodes, the gain is really important.
Some templates were reduced to under 20 sec while they were taking minutes before.
The config files for a Prometheus server with all known hosts and targets is a good candidate for this.
Mind the restriction on the example to the current host, but you can cache hostvars entirely. With the tradeoff to greatly increase the memory consumption on the ansible controller.

Using this, I didn’t see any real change in the execution time for the most intensive templates when upgrading from Ansible v2.9 to core v2.16.

This trick works as long as you don’t add any variable definition with higher precedence than the existing, which can change at each play/block/role/task, which is why we recalculate every time.

As for ‘making things an error’, I point you at our deprecation policy of 4 versions … which might not be enough for you if you are still running 2.9 but it should be if running 2.16, which we believe is reasonable. If we get enough data to say otherwise we would revisit the deprecation cycle, as we have done in the past (from 2 versions to 4). Also note that we have slowed down our release cadence, both because the project is more mature and to align with both Python and OS release cycles.

It looks like it was possible to use {{ item }} in the name of a task with a loop in 2.18. But in 2.19, this seems to result in a warning now (fix vmware_folder_template_from_vm tests).

Does anyone know if this is expected, maybe because this has already been wrong but went undetected in previous releases, or if it’s a bug?

I don’t think using {{ item }} in a task’s title ever made much sense, since the title is printed once for all loop iterations. Maybe in some older versions the title was printed for the first loop item, and so it was possible, but that doesn’t sound like something you should rely on - and it’s also nothing that sounds very helpful (using parts of one iteration in the title), unless you abuse loops for fetching a single piece of data.

3 Likes

It’s always been wrong, it would just silently not work so it was harder to spot it in the output.

TASK [Test {{ item }}] *********************************************************
ok: [localhost] => (item=a) => 
    msg: Hello world!
4 Likes

Coming back after running multiples tests with core 2.19b and opening some issues
(Note : there is a limitation of 2 links per post, and had to remove the other direct links to the issues)

First things first

1)

Also, I doubt they meant to be rude, it is probably the comment born from already being tired and looking at work they don’t want nor need to do.

I will echo bcoca and say that I doubt flowerysong meant to come off as rude but I see how it could have been interpreted that way.

In my experience, is it more than being rude.
This usually happen because something was seen ticking them enough to warrant an answer, while they initially decided you are not worthy of their time.
Using “I’m not reading all this” as an excuse just to answer this small detail.
Days later and not being tired will not change this, the initial post will never be read.


2)

@bcoca
As for ‘making things an error’, I point you at our deprecation policy of 4 versions … which might not be enough for you if you are still running 2.9 but it should be if running 2.16, which we believe is reasonable. If we get enough data to say otherwise we would revisit the deprecation cycle, as we have done in the past (from 2 versions to 4). Also note that we have slowed down our release cadence, both because the project is more mature and to align with both Python and OS release cycles.

Again, it is not about deprecating the warn parameter. It had to go, on this we agree.
Nor It is about the deprecation policy.

It is about the fact the 2 main choices to handle the removal of the warn parameter were :
a) raising an immediate error when the parameter is found in a command or shell task.
b) ignoring the parameter, and printing a warning message in the execution (or printing nothing).

The Ansible team, if not you, choose A), without a care about the fallout. Granted, not excessive, but still breaking hard for the first time the compatibility between Ansible versions, namely 2.14 vs 2.13 and lower, with no alternative common to all versions.
The problem is that such door was opened, which shouldn’t have been. Because if done one time, it will be done again. Precisely what is happening again with the 2.19 data tagging, as a much larger scope.

I would like to point that for example, the venerable grep command has the -y parameter which is obsolete, but kept for compatibility as stated in the MAN. cURL has also some behavior related to certificates also kept for compatibility.


3)

As of now with core 2.19.0b6, with my way of not trusting the usual automatic type conversion from any language, a large part of the Ansible magic altered with 2.19 has a limited impact on the roles I’ve built since years.
As said, my own results must not be taken as a reference.
Point in case, some colleagues have much much more trouble, sometimes not understanding why, and far from being done.

I also didn’t see much of any speed improvement.
I’m waiting for the next beta containing the fix for the profile_tasks regression (#85331) before comparing the numbers more precisely.

But I also have some roles unable to complete at all due to errors with 2.19. Even while using the _ANSIBLE_TEMPLAR_SANDBOX_MODE environment variable.
Another problem is, the most advanced capabilities of my roles will not survive with ansible 2.19 without this templar sandox mode parameter. Mostly, the generic “serial by group” and the special inventory parsing making it much more natural.
And that’s a serious problem, because as of now, this is a temporary workaround not here to last (for now).


4) Extremely serious question about core 2.19 changes :
When preparing the changes for core 2.19, did the Ansible team ask themselves at a time about the possibility a rollback might be required ? Or at the minimum, having to hit the brakes hard ?

Given the precedent with the warn parameter, the evasion around it, and some borderline, if not unprofessional answers I got on the opened issues for core 2.19b, I personally don’t think so.
In fact, I’ve lost all trust and expect another ansible core release breaking again a major part hard very soon.

As I’m not one to use those words lightly :

  • issue #85336 ipv4 (and others from ansible.netcommon) not usable with the short name
    I understand the fact it might not be ansible-core, but hey, not everybody is privy to the subtleties of the repo separations, Especially for a collection in the ansible.* namespace.
    Second, “not my backyard, go search somewhere in the building” is not an answer. This is not a community collection, but an ansible.* namespace collection, a major part of Ansible itself. As stated on its readme, supported by Red Hat.
    I would have gladly taken being pointed to the correct repo and asked to reopen the issue there.

  • issue #85333 --one-line parameter deprecated ?
    Unannounced change, and I’m still baffled by the answer. Especially for a module, if I’m not mistaken, less than 100 lines comments included, existing since (nearly) the start of Ansible.

  • And the madness of ansible_template
    You really expect people editing all their inventories because you refuse to be bothered to keep it in a restricted form in ansible.cfg, if not making it hard-coded internally ?

It is not to undermine the work you’ve done, Ansible has become a major player, that ticked soundly all the right cases from start.
But seriously, the situation now with core 2.19 really show the disconnect and the contempt of a dev team too much in deep, adamant to punch through the target at hand because deadlines, without a care for anybody nor the consequences, while being aware of them.
Coming from the production side, that’s sort of situation is typical. I just would have never imagined seeing this on Ansible itself.

And in this, you are also forgetting you have real customers. Notably Red Hat customers, for anything related to Ansible Tower / AAP.
I am not sure the folks at Red Hat will accept taking flak happily.
Mostly because they will have in front some people telling them to fix it, without taking no for an answer : they are paying a very hefty sum for the support.
For them, core is just one cog in the larger machine of Ansible AAP.
Being told they must rework a non-negligible part of the playbooks, roles and tasks they have built and were working correctly for years before 2.19 due to a change in a cog will not be accepted.
Wondering if some will be pissed enough to go through a fork.

You might be interpreting more into this than there actually is. For longer posts folks often tend to only glance over them instead of reading them in detail. If someone wants to do that, why shouldn’t they do it? Not everyone has enough spare time now or later, or enough energy to read long posts. And if you glance over a post and spot something you want to comment on while glancing over, it is generally totally find to do that. (I’ve also done that in the past.) Interpreting that as bad intentions is usually a bad idea, since it’s often not what happened.

The first answer by a member of the core team to your issue included a link to the PR which started the transition, which is part of the repository for ansible.netcommon. That is the repository you have been looking for.

A deprecation is an announcement. So this is not an unannounced change, but itself an announcement.

Since AAP seems to focus on Execution Environments, I guess the solution for AWX/AAP users is to use EEs with older ansible-core versions for content that doesn’t work with 2.19+.