role argument validation for arbitrary keys

So I was eager to try out the new role argument validation which is in the pipeline for 2.11 (https://docs.ansible.com/ansible/devel/user_guide/playbooks_reuse_roles.html#role-argument-validation), but immediately ran into an issue, and thought it would be good to discuss.

The issue is the ability to handle arbitrary keys within dictionaries. For example if you have a variable such as:
users:
john:
id: 1000
shell: /bin/bash
marcy:
id: 1001
shell: /bin/zsh

There is currently no way to document this as the keys are arbitrary (user names), and the spec requires all keys to be explicitly declared.
Now I did originally open this as an issue on GitHub (https://github.com/ansible/ansible/issues/74001), however it was closed without much discussion other than “not supported, too difficult, not planned”. But I’m not quite following that. I may not have a complete understanding of the code, but it looks like it boils down to the _get_unsupported_parameters() function, which looks like it’d be rather easy to support a pre-defined/constant spec key (e.g. “*”) to act as a catch-all for unknown keys. Now this wouldn’t be able to declare the key type (e.g. string/int/bool/…), but for that, maybe some per-type constants would address that (e.g. “*string”/“*int”).

So can someone explain why this, or something like it can’t be done? Or why it’s too difficult?

Thanks

-Patrick

Patrick,

Thanks for reaching out on the mailing list. We came across that issue during our biweekly triage meeting where we don’t have time to respond in detail since we are just trying to evaluate all the issues in the queue.

The first problem with allowing arbitrary keys in the argument spec is it goes against the intention of how argument spec validation in Ansible is designed to work. Namely, for each key specified, run the value through check_type_[something] to validate and coerce the data to the correct type.

Argument spec validation in Ansible is not meant to be a generic schema validation mechanism, but rather a somewhat strict input validation system that provides a stable set of defined variables a module can depend on as input. Argument spec validation takes care of defining all parameters in the spec to at least a value of None or provides a default value if specified.

As you pointed out, allowing arbitrary keys means we also need to have a system for handling those. If we allow arbitrary keys and assume the default type (raw) they would pass validation, but may contain completely incorrect data (check_type_raw just returns the passed in value).

To do this properly, we would have add features to the argument spec to allow for setting the default intended type, or some other behavior, so the appropriate checker can be selected for keys that are not explicitly declared. Adding toggles that change behavior is a pattern I try to avoid (“‘type’: ‘dict’ sometimes allow arbitrary keys, but sometimes doesn’t, depending on if have the ‘allow_undefined_keys=True’ parameter set and don’t forget to also set 'default_type_for_undefined_params=‘str’” as a hypothetical example).

This would also allow bogus data in, such as key names with typos. This could confuse and frustrate the user when the modules/role passes validation but they later see errors for undefined variables or default values being used when they thought they had specified a value.

I’m sure there are probably some issues with setting defaults and possibly aliases as well. Most of the challenges would be in _validate_argument_values(), _validate_argument_types(), or _validate_sub_spec().

While it is possible to change argument spec validation to allow arbitrary keys for dictionaries (anything is possible in software), it goes against the design intention of the feature in Ansible and would most likely have a lot of odd ripple effects both in the implementation and in bugs it would introduce.

A better way to get close to accomplishing what you want would be to use ‘type’: ‘list’ with ‘elements’: ‘dict’, as mentioned in the issue. You would have to change the input data from a dict to a list, but it would allow validation of the structure of each element in the list and avoid the problem of arbitrary keys in a dictionary.

{
‘users’: {
‘type’: ‘list’,
‘elements’: ‘dict’,
‘options’: {
‘name’: {‘type’: ‘str’, ‘required’: True},
‘id’: {‘type’: ‘int’},
‘home’: {‘type’: ‘path’, ‘required’: True},
}
},
}

Thanks for the response.

Adding toggles that change behavior is a pattern I try to avoid

That’s why I was more leaning towards a form of catch-all key.

This would also allow bogus data in, such as key names with typos.

If the role’s interface uses a fixed list of keys, and they define the role as accepting any key, then yes, it’s going to error. But that’s not because the validation is bad. It’s because the user misconfigured it.
The purpose behind the idea is when the keys themselves are values. For use when the role is going to iterate over it, not look up specific keys.

A better way to get close to accomplishing what you want would be to use ‘type’: ‘list’ with ‘elements’: ‘dict’, as mentioned in the issue.

While not terrible, in my view this misrepresents the data. When you’re using a dictionary, you’re saying: “Here’s a collection of things, each with a unique identifier / index.” When you’re using a list, you’re saying “Here’s a list of things, where order likely matters, and they all will be processed without unique constraints.”

That’s why I was more leaning towards a form of catch-all key.

I think the main issue with this, as the code stands today, is that many of the functions that perform the validation compare provided keys to those keys in the spec to decide whether or not further work is needed (setting defaults, handling aliases, etc.). All that logic would have to be reworked in order to allow arbitrary keys (again, this is possible but the level of effort required does not seem to justify the change).

While not terrible, in my view this misrepresents the data.

I agree. It’s the only disadvantage to this approach, especially when you need a dictionary and all the behaviors inherent to that type. Most of the time in Ansible, a list of dicts is sufficient to accomplish the same goal. But I do understand the drawbacks.