freebsd buildworld/buildkernel crashes host OS ??

Dimon · May 2, 2014, 12:02am

Hi,

I am trying to automate “jail” creation under FreeBSD which includes building the OS from sources which is rather lengthy.

I’m using “polling” while doing it:

name: build world
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ build_conf }} make buildworld > /tmp/build.log 2>&1

async: 45

poll: 30

name: install world
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ install_conf }} make installworld DESTDIR={{ disk_mount_point }} >> /tmp/build.log 2>&1
poll: 30
name: make distribution
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ install_conf }} make distribution DESTDIR={{ disk_mount_point }} >> /tmp/build.log 2>&1
poll: 30
name: build kernel
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ build_conf }} make buildkernel DESTDIR={{ disk_mount_point }} KERNCONFIG={{ kernel_confi
g }} >> /tmp/build.log 2>&1
poll: 30
name: install kernel
shell: executable=/bin/sh chdir=/usr/src SRCCONF={{ install_conf }} make installkernel DESTDIR={{ disk_mount_point }} KERNCONFIG={{ kernel_config }} >> /tmp/build.log 2>&1
poll: 30

however every time I hit “build kernel” my builder VM crashes. When doing the same task manually - I have no such issue. (whether using screen or without it).

Anybody seen something similar in the wild/experienced similar issues?

Michael_DeHaan1 · May 2, 2014, 5:56pm

Any logs available?

Likely not an ansible issue, though I’m not sure how it would be different.

Dimon · May 2, 2014, 10:54pm

Any logs available?

nothing that would identify the culprit. I’m re-running the playbook at the moment to test some other aspects of it. if anything pops up - I’ll post here.

The thing is - ansible just “sits there” while VM has rebooted itself (naturally, since there’s a “poll interval” involved) so it’s not like ansible is crashing on controller end. I’m just wondering whether something is leaking memory on the host side. Build world produces a lot of output. However I did pipe it to a file to avoid such a thing. Not sure what else could be causing it.

Likely not an ansible issue, though I’m not sure how it would be different.

see above. I don’t have any hard evidence one way or another, however indirect evidence suggests that something about ansible is what affecting it. I ran exact same commands either via straight SSH session or “screen” - in both cases not a problem.

Question: I didn’t look at the code (yet) however due to the polling I’m assuming python script on the host side will be running sub-process with redirected outputs etc. could there be a memory leak due to a significant number of polls within that time (build time is about 1-2h on that box).

Am I using the right strategy for this? Since I can’t go async with that task - should I forgo “poll” completely? My only worry is that intermittent network issues might terminate task and “poll” may prevent that. Am I correct here?

Michael_DeHaan1 · May 3, 2014, 1:22am

"
Question: I didn’t look at the code (yet) however due to the polling I’m assuming python script on the host side will be running sub-process with redirected outputs etc. could there be a memory leak due to a significant number of polls within that time (build time is about 1-2h on that box). "

Shouldn’t be the case – poll options don’t really save previous results.

I should say though (this is unrelated), Ansible is not meant to be a build system. Using something like Jenkins to produce build products and then to have Ansible deploy artifacts from it is the norm. If a deploy process is taking 1-2 hours that sounds really strange to me, and I think there may be some better ways to optimize things. That being said, I haven’t seen a 2 hour build since an unoptimized Java compile back in ~2004, so it’s been a while, and you might have other reasons or I might not be understanding the use cases.

If you can dig more and find out what’s up, I’d be very interested in findings.

Dimon · May 3, 2014, 2:33am

the worst thing has happened - my last re-run went through just fine.

I do not like “inconsistent” but have no idea what is happening and why. I did update playbook but not the section that was “crashing”. I’ll dig deeper next time problem surfaces. I’ll probably be re-running playbook within the next week so that should give it a chance to crash again

Jenkins won’t help as far as I know (could be wrong) as I still need to fire off build as part of playbook and wait for it’s artifacts to be produced.

Dimon · May 13, 2014, 4:20am

For posterity: I’ve tracked down the issue - it was FreeBSD’s UFS “Journaling” that was crashing things. Looks like ansible was able to thrash system well enough to expose issues with that FS feature. After disabling it - things seems to be running as expected.

Michael_DeHaan1 · May 13, 2014, 3:40pm

Exciting!

Topic		Replies	Views
Bootstrapping FreeBSD with Ansible 2.0: RAW module does not work anymore? Ansible Project	8	0	April 1, 2016
playbook hangs during shell commanf execution Ansible Project awx	1	2	October 30, 2019
Looking for testers for the jail and zone connection plugin Ansible Project	0	1	October 1, 2015
FreeBSD Jails support (cron, service, user, group) Ansible Developer	2	7	August 11, 2015
Ansible on Freebsd with pkgng Ansible Project	2	4	November 17, 2014

freebsd buildworld/buildkernel crashes host OS ??

async: 45

Related topics