I am trying to automate “jail” creation under FreeBSD which includes building the OS from sources which is rather lengthy.
I’m using “polling” while doing it:
name: build world
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ build_conf }} make buildworld > /tmp/build.log 2>&1
async: 45
poll: 30
name: install world
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ install_conf }} make installworld DESTDIR={{ disk_mount_point }} >> /tmp/build.log 2>&1
poll: 30
name: make distribution
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ install_conf }} make distribution DESTDIR={{ disk_mount_point }} >> /tmp/build.log 2>&1
poll: 30
however every time I hit “build kernel” my builder VM crashes. When doing the same task manually - I have no such issue. (whether using screen or without it).
Anybody seen something similar in the wild/experienced similar issues?
nothing that would identify the culprit. I’m re-running the playbook at the moment to test some other aspects of it. if anything pops up - I’ll post here.
The thing is - ansible just “sits there” while VM has rebooted itself (naturally, since there’s a “poll interval” involved) so it’s not like ansible is crashing on controller end. I’m just wondering whether something is leaking memory on the host side. Build world produces a lot of output. However I did pipe it to a file to avoid such a thing. Not sure what else could be causing it.
Likely not an ansible issue, though I’m not sure how it would be different.
see above. I don’t have any hard evidence one way or another, however indirect evidence suggests that something about ansible is what affecting it. I ran exact same commands either via straight SSH session or “screen” - in both cases not a problem.
Question: I didn’t look at the code (yet) however due to the polling I’m assuming python script on the host side will be running sub-process with redirected outputs etc. could there be a memory leak due to a significant number of polls within that time (build time is about 1-2h on that box).
Am I using the right strategy for this? Since I can’t go async with that task - should I forgo “poll” completely? My only worry is that intermittent network issues might terminate task and “poll” may prevent that. Am I correct here?
"
Question: I didn’t look at the code (yet) however due to the polling I’m assuming python script on the host side will be running sub-process with redirected outputs etc. could there be a memory leak due to a significant number of polls within that time (build time is about 1-2h on that box). "
Shouldn’t be the case – poll options don’t really save previous results.
I should say though (this is unrelated), Ansible is not meant to be a build system. Using something like Jenkins to produce build products and then to have Ansible deploy artifacts from it is the norm. If a deploy process is taking 1-2 hours that sounds really strange to me, and I think there may be some better ways to optimize things. That being said, I haven’t seen a 2 hour build since an unoptimized Java compile back in ~2004, so it’s been a while, and you might have other reasons or I might not be understanding the use cases.
If you can dig more and find out what’s up, I’d be very interested in findings.
the worst thing has happened - my last re-run went through just fine.
I do not like “inconsistent” but have no idea what is happening and why. I did update playbook but not the section that was “crashing”. I’ll dig deeper next time problem surfaces. I’ll probably be re-running playbook within the next week so that should give it a chance to crash again
Jenkins won’t help as far as I know (could be wrong) as I still need to fire off build as part of playbook and wait for it’s artifacts to be produced.
For posterity: I’ve tracked down the issue - it was FreeBSD’s UFS “Journaling” that was crashing things. Looks like ansible was able to thrash system well enough to expose issues with that FS feature. After disabling it - things seems to be running as expected.