Readable error message for all humans

Greetings,

Someone at work brought this one to me, and I thought I would put the question out there and see what others do/think about this.

We have a deployment tool which early on transformed itself into a local development environment management tool as well (it provisions a VM according to the configuration and requirements of a project, which can me modified at any time using a configuration file). Works fantastically well, but unlike system managers, developers don’t want to care about error cases. So for required configuration, we go check the data wherever a default is not possible, and print out a human-readable error with some details. However, it happens sometime that the failure is due to a bug in the playbook, or to some manual modifications a user has done on his machine, and so on.

My question would be: is there a proper pattern to print out human-readable errors which would be oriented to a customer and not to someone doing deployments and operation for a living? I am thinking of pushing the tool itself towards less and less technical people (for all sorts of reasons), so for me it would be nice if we had a way to, say “This error should never happen, contact operations” or “This my be caused by a network connectivity problem. Check your internet connection, and please try again” when you try to download something and it fails. I can imagine that the ability to create generic error messages would also come handy.

Cheers!

Hi, There is no single pattern for system failure causes. Systems can fail in many ways by many causes. However, you can follow a statistical method by analyzing the most common errors caused by user configuration or usage and create a mapping with possible remedies or workarounds. Make sure though that you do not overestimate your guessing for an error cause and do not hide any useful details. You may have historical indications that an error was caused by user misconfiguration when it could be actually a bug. So, I would suggest to always have your tool create a detailed error report for your system engineers, regardless the error.

I’m not sure how this relates to Ansible specifically.

If you can phrase this in terms of improving Ansible error messages in ways that would make better sense for non-technical users, I’m interested in the discussion.

Maybe something like:

  • name: “Some task”
    errorMessage: “This task might have failed because of bad network connectivity”
    curl: […]

Or something like that.

I have no idea what format would be nice. But I am thinking that it could be nice to list at least some of the potential cause of the error which are known at the time of writing the role or playbook.

So, if I understand you correctly, you are proposing for a way to output supplementary messages as hints to what may have gone wrong when playbook/role tasks fail and possibly what can be done to overcome the error. I think this could be useful for helping users recover from playbook/role-specific error conditions that the playbook/role writer can guess, but the module writer cannot. Do you have any thoughts on the presentation format? P.S. The camel-cased “errorMessage” is surely not a good name for this. I would prefer something like ‘error_hints’ or ‘failure_hints’ that take a list of strings.

We will not be doing this, by the way.

Hum, what do you mean? That it is a bad format, a bad idea overall, or that it will need to come from the open-source community?

Ad for the format, I don’t really care. I can try to think of something better.

I don’t think it’s a very effective idea for Ansible, when there are often thousands of things that could produce a failure. We will share the failure message, but the “why” is something that humans should decipher.

We cannot obviously have a meaningful message for all errors, but I think it would be nice to offer something in the ballpark of “you might want to check the following things on your system”. For my use case, I think that would be a start.

BTW, this might not be the most elegant solution for your case, but you could write ‘debug’ tasks with a conditional to run only on failure of preceding tasks to output the meaningful message you want.

Seems like this this idea won’t fly very far. Too bad.

Let me rephrase the question then: Ansible in its current state, is there a way to get the error as a data object, from which a parent program would be able to decide on how to either handle or present the error?

What do you mean by “parent program” ?

Ansible already returns JSON data from modules, and callback plugins are available.

Ah, did not know, I’ll take a look.