Relaunch a failed job automatically in AWX

I can’t seem to find any mechanism for this. What I need is if a job fails to relaunch it (like you can do in the ui) automatically.

We run AWX in kubernetes in AWS and sometimes jobs can fail because container is killed or something transient. If the job were simply relaunched it will succeed.

What I am looking for is some flag or setting to relaunch the job on failure

We’ve solved this by utilizing a workflow template with a “run on fail” step to rerun the task we know can fail.

Other ideas could involve blocks with rescue statements or using an event-based trigger to rerun the job number but these seem to be more hassle than what it’s worth. Curious what others have to say as we’ve run into similar issues with inventory syncs and certain systems that a rerun of the problem task fixes the issue 90% of the time.

Best regards,

Joe

1 Like

This caught my eye because I had just come from approving a commit from a colleague involving

  register: mw_gitlab_gitlab_restart_result
  retries: 3
  delay: 10
  until: mw_gitlab_gitlab_restart_result.rc == 0

But if your issue is unexpected termination of the AWX container running that task, or block if you’re trying block rescue, etc., then that’s not going to help.

We have a few “fragile points” as well, particularly in our post-commit pipelines on Jenkins. It’s just frequent enough to be annoying but not so annoying as to be intolerable, so nobody has taken the time to understand what’s actually failing.

Putting all my blathering aside, the re-run on failure workflow outlined by @trippinnik above is probably your best next step to get this working. After that, try to figure out what’s killing your containers, because working around this symptom won’t fix the problem.

1 Like