Hi,
…we’re experiencing problems running awx jobs due to memory exhaustion, I guess. The VM has decent memory (16G) and our jobs ran well for quite some time… But right now, it’s almost a 50:50 chance, that jobs are failing (output just stops - no error in the details of the job output…) - when I look at the system, I can see that there’s almost no memory left:
top - 07:26:54 up 2 days, 22:48, 3 users, load average: 0.00, 0.04, 0.07
Tasks: 289 total, 1 running, 287 sleeping, 0 stopped, 1 zombie
%Cpu(s): 1.1 us, 0.2 sy, 0.0 ni, 98.5 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 16002.2 total, 572.8 free, 11127.4 used, 4302.0 buff/cache
MiB Swap: 4096.0 total, 4082.4 free, 13.6 used. 4404.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4452 u032359 20 0 3227024 3.0g 21540 S 0.0 19.5 0:21.45 uwsgi
4453 u032359 20 0 2733744 2.6g 22308 S 0.0 16.5 0:34.42 uwsgi
4454 u032359 20 0 1689112 1.6g 17964 S 0.0 10.2 0:08.28 uwsgi
4450 u032359 20 0 1192644 1.1g 21752 S 0.0 7.1 0:05.32 uwsgi
4451 u032359 20 0 806172 782648 17308 S 0.0 4.8 0:03.96 uwsgi
841 root 20 0 1394412 514280 91092 S 8.9 3.1 277:29.50 k3s-server
4488 u032359 20 0 175364 145524 9556 S 0.0 0.9 1:46.60 awx-manage
4475 u032359 20 0 678812 141732 17236 S 0.0 0.9 4:41.08 awx-manage
214943 u032359 20 0 682804 138636 11212 S 0.0 0.8 0:00.65 awx-manage
214881 u032359 20 0 682872 138616 11212 S 0.0 0.8 0:00.68 awx-manage
4489 u032359 20 0 167396 137672 9548 S 0.0 0.8 1:50.24 awx-manage
215195 u032359 20 0 682688 136520 9352 S 0.0 0.8 0:00.39 awx-manage
215273 u032359 20 0 682428 136324 9348 S 0.0 0.8 0:00.25 awx-manage
4476 u032359 20 0 155472 134056 17168 S 0.0 0.8 1:54.50 awx-manage
4490 u032359 20 0 163384 132588 9544 S 0.0 0.8 1:50.01 awx-manage
4491 u032359 20 0 163064 132340 9512 S 0.0 0.8 1:52.48 awx-manage
4447 u032359 20 0 210160 118848 17620 S 0.0 0.7 1:55.35 awx-manage
4486 u032359 20 0 153708 108940 6328 S 0.0 0.7 3:49.42 awx-manage
4446 u032359 20 0 195720 105356 17296 S 0.0 0.6 1:19.55 daphne
3409 999 20 0 213128 84176 82252 S 0.0 0.5 0:02.74 postgres
2656 65532 20 0 799396 72800 55004 S 0.0 0.4 1:32.21 traefik
485 root 19 -1 145496 69596 68420 S 0.0 0.4 0:04.88 systemd-journal
1118 root 20 0 768336 61364 30612 S 0.0 0.4 31:43.44 containerd
3596 u032359 20 0 750832 47664 28832 S 0.0 0.3 19:25.71 metrics-server
3021 root 20 0 754780 43756 30116 S 0.0 0.3 8:12.12 coredns
3091 isarnet 20 0 742924 39912 22044 S 0.0 0.2 7:31.85 ansible-operato
4057 u032359 20 0 2368144 36252 21900 S 1.0 0.2 13:31.51 receptor
521 root rt 0 289460 27244 9076 S 0.0 0.2 0:22.01 multipathd
3927 u032359 20 0 29316 25156 9300 S 0.0 0.2 0:47.68 supervisord
2668 root 20 0 734716 24912 18076 S 0.0 0.2 1:39.18 local-path-prov
3981 u032359 20 0 29032 24760 9068 S 0.0 0.2 0:58.72 supervisord
Any hint on how to troubleshoot that further? It seems like the uwsgi process consumes way to much memory. AWX is running on k3s:
awx-operator: 0.23.0
k3s version v1.23.6+k3s1 (418c3fa8)
go version go1.17.5
AWX 21.2.0
Jobs are using dynamic inventory from nautobot.
Thanks,
Andreas