re-run failed task only

Hi all,

I would like to discuss whether ansible should support the re-run failed task. This is different with the suggested answer here:
http://docs.ansible.com/playbooks_loops.html#do-until-loops

It is similar to how we would run unit tests and only re-run the failed tests during development.

Here’s the feature request summary got closed but I still want further discussion around this topic
https://github.com/ansible/ansible/issues/8896

Thanks,
Kien

I’ve thought about this a bunch, but it’s really hard. Many of our tasks require data from previous tasks, and often those previous tasks will “succeed” in one run only to have a later dependent task fail. Figuring out which ones to re-run there is nearly impossible.

Since Ansible is designed to support idempotence, we make sure that we can re-run any of our playbooks at will. Tasks which have already completed will finish fast as ‘unchanged’ and only the tasks that haven’t ran yet will cause new change. Trying to bake something into ansible to only re-run failed tasks is probably going to cause too many gotchas to be really useful.

For your own setup, you could make use of —start-at-task if you really know where you can skip ahead to.

-jlk

Hi Jesse,

Thanks for your response. I was looking at the implementation of --start-at-task and looks like it just read the task name:
https://github.com/ansible/ansible/blob/7ffa9cecaef12a17a5fc5053938a6dfbf7171c23/lib/ansible/callbacks.py#L607

Then would it be possible that we write failed tasks name to file and with a flag such as --failed-tasks-only, we would parse that file at the same location where we are handling start_at now?

Thanks,
Kien

Rather than discussing the previous ticket (our reasons hold here), let’s discuss the use case a bit first so we can get a greater understanding.

What is the task you are running and why do you need to rerun it?

That may lead to some modelling suggestions.

Michael,

It’s more of for development. It’s how we do unit testing where you write tests, expect some to fail but you only want to re-run failed tests only until it’s all green.

The same thing apply to ansible tasks. I might have 100 tasks and only 1 or 2 tasks failed. Now I have to re-run 100 tasks again just to check if I have fixed the 2 tasks that failed. It would be awesome if I just have to run 1,2 tasks that failed to quickly verify it during development. We actually spend a lot of time developing these playbook tasks to get it right.

Well the problem is if you just re-run the failed parts, you won’t be validating that the previous steps can run again cleanly on top a second time, right? In which case, running them again makes sense, as it will just go over the server policy and check to make sure everything is up to date.

I understand what you are saying about targetting specific parts of the config, and I do like tagged roles for that kind of thing pretty well.

Some people like --start-at-task, which sounds like it will do what you want though, start at that particular point. I don’t use it though.

Hi Michael,

I think I’m ok with --start-at-task for now. Basically if my task #50 failed out of total 100 tasks, I would at least cut half of the runtime already.

My point to Jesse is it doesn’t seem very complicated if we already have --start-at-task implemented to support --start-failed-task-only.

Thanks,

Right now the retry file doesn’t record this and just returns a “–limit @filename.yml” type file. If it did, it might be more straightforward to make this an option, but we’d need something like a --retry-file or something.