Amazon EC2 inventory sync takes a huge time for each `Deleted host` when

Hello,

We’re using AWX with EC2 inventory for a while (currently AWX 3.0.1.0 but we went through pretty much all previous versions).

We have Overwrite and Update On Launch for the update options of the source.

It works impecable when there are very few changes in the infrastructure, except when we renew all the spotfleets (for example) and the change in respect to the amount of hosts it needs to clean up is big (hunderds). Then the first inventory update run after the change takes hours, blocking all other operations… (then next inventory sync go well)

In debug it shows how the whole inventory is done in 5s, but then deleting hosts takes a long time (even the first two hosts are 45 seconds apart).


5.724 INFO Loaded 291 groups, 117 hosts…
5.899 DEBUG Deleted host “xxx”
45.383 DEBUG Deleted host “xxx”
71.704 DEBUG Deleted host “xxx”

1131.550 INFO Inventory import completed for xxxxxxx in 1129.7s

My question is - is there any way to perform the deletion in a decoupled way? Kind of like eventual consistency (I can live with a wrong count of failed hosts/total hosts for a while, while the jobs can get new inventory.

The way I handle it now, I uncheck Overwrite to unblock the pipeline and keep running jobs, and when I can, I run manually a inventory sync with a huge timeout. But it’s far from ideal, since I never know when the infrastructure is going to be changed enough to cause trouble…

Thanks for any kind of ideas to address it.
Regards,

This is a major concern of ours. Scaling of large inventory imports are very important.

In order to mimic your setup, I found an old inventory import performed from the AWS infrastructure I have to test with, a few months old, with a lot of machines experiencing significant churn. The counts are roughly 500 groups and 200 hosts, which varies frequently. This seems comparable to what you had. I turned the verbosity to DEBUG, turned on overwrite, and ran a new import.

Here’s a screenshot with the hostnames cropped off the screen and a search applied to get the relevant host deletion messages. The import took less than 10 minutes, but is still vastly slower than if INFO level logging is used. If you want your own case resolved, my main advice would be to turn off DEBUG level verbosity, because stdout processing will consume the vast majority of resources. From my experiment, however, I can’t replicate what you saw. It is concerning how you said that it goes fast with no changes, but slow with a lot of changes. Still, with my testing, a large fraction of the hosts were deleted in the import, and the deletions ran normally. If you have any other significant details about what might be different, it would be helpful to know. Thanks!

Alan
github: AlanCoding

This might also depend on how fast the host is you are running AWX on.