Ansible playbook sporadically hangs when run from cron

Hi All,

I have playbook which backups EC2 instances in AWS every night.
It runs on a EC2 instance (CentOS Linux release 7.2.1511 Core) hosted in AWS in the same regions as the backed-up instances.
Playbook uses only localhost (see below) and AWS modules. No SSH is used for accessing other instances.

`

  • hosts: localhost
    connection: local
    gather_facts: no
    `

All the backup jobs run fine from command line. They typically execute within 2 - 15 minutes.
All the logs are written to /var/log/ansible/ansible.log/ansible.log

The cron job looks likes this:

`
#Run backup job
00 20 * * * (pushd $ANSIBLE_DIR && ansible-playbook backup.yml -e @server.json -vv) >/dev/null 2>&1

`

and it runs fine almost all the time but sometimes, some random jobs just hang during the night and stay in RAM consuming both memory & CPU.
I checked the logs (both Ansible and /var/log/messages) and I could not find any evidences (error messages) regarding why this happens.

I’m using developer branch (2.2.0) but the same happens with the latest stable 2.0.2.0

Could you please advise how to troubleshoot such type of issues?
How can I find the task where it stucks?

Regards,
Constantin

I'd like to know why you need ansible for this, running the actual
backup script from cron seems like a more direct way. But you may have
your reasons, so to debug this, I would get rid of the ">/dev/null
2>&1" in your cron job to get more output. So you would get an e-mail
for each running job.

Also, I would include some connectivity tests in the playbook, just to
make sure the network is not the problem.
Also, a cron job just pinging the target before the actual job might
be an idea.

Johannes

Thanks for advice,

Because in Ansible is simpler/easier (+better documented) to implement the backup strategy with specific retention policy.

Just tested - if I get rid of those redirection and get by e-mail the successful playbook runs. No errors. Everything is fine.

If the process got stuck - anyway I wouldn’t get a notification by e-mail because cron job hasn’t finished … still running, right?

Are you suggesting to test reachability of AWS endpoints?

Regards,
Constantin