AWX Job cleanup SQL

Job Cleanup degrades AWX service.

  • currently have 120 days retention on the events, and inventoryupdateevent is quite large over few hundred millions rows.
    Running the job cleanup from GUI degrades AWX for running/querying jobs. Notice the following SQL in the postgres. Is there a safe way to clean up inventoryupdateevent directly on PostgreSQL? Not sure what would happen to AWX if the drop is done directly from Postgres. Your insight would be much appreciated! Using AWX 22.1.10
DROP TABLE main_jobevent_20231104_22```
1 Like

Hi there,

We would not recommend dropping directly from your database. I can’t say it wouldn’t work one time but it’s more likely than not going to cause issues withing AWX.
Your best options (in my opinion) would go ahead and drop the retention time of your events down from 120 days to something less so there isn’t such a build up.
I would also suggest scheduling the management job to run more frequently (maybe even daily, depending on your load) on an automated basis to help with making the cleanup less overwhelming to the system all at once.
Regardless, deleting stuff from the db is not recommended and the best option for keeping AWX stable and not interrupting it’s functionality too much is to give the management job less to chew through on a more consistent basis.

Also, just to round out with as much info as possible, for anyone reading this forum thread that might not know, we do have an option that lets you run the cleanup management job via the command line.
You can do so using the awx-manage command (docs here: 16. The awx-manage Utility — Ansible AWX community documentation) and you can also clean up your activity stream this way as well.

Hope this helps,
-AWX Team

If you already had other logging tools like loki, ELK, Splunk,… You can set retention day to lower value. In my prod, I only set 28 days but the pg volume size ~ 11 GB and it keep growing slowly day by day.

Thanks for the replies.

Ended up reducing the number of retentions slowly limited the impact on the users. A big change from 120 → 30 days failed the job with timeout.