Hi group,
We are running AWX 15.0.1 in production. It is a 10-node cluster running in Kubernetes.
Recently we started encountering failures running some jobs (and seemingly more apparent when running multiple instances of the same job in parallel), with an error about the project update having failed.
[…]
File “/var/lib/awx/venv/awx/lib/python3.6/site-packages/awx/main/tasks.py”, line 1542, in run raise AwxTaskError.TaskError(self.instance, rc) Exception: project_update 586629 (failed) encountered an error (rc=2), please see task stdout for details.
I took a look at the AWX task logs and found:
fatal: [localhost]: FAILED! => {“changed”: false, “cmd”: [“/usr/bin/git”, “fetch”, “–tags”, “origin”, “refs/heads/:refs/remotes/origin/”], “msg”: “Failed to download remote objects and refs: From https:///scm/\n ! [rejected] → origin/ (non-fast-forward)\n”}
In my experience this error usually occurs when someone does a push force to the repo in question and the AWX project does not have the option to delete before cloning.
Where it gets interesting is that the project update that failed is using the master branch, whereas the failure reported therein, from BitBucket, references the development branch. There is however another project in AWX pointed to the development branch.
Is it possible that these two projects are conflicting with each other somehow?
The project for our master branch has “update revision on launch” disabled, but “clean” and “delete on update” enabled.
The other project, using the development branch, has “update on launch enabled” and delete on update" disabled. I turned on the “delete on update” option for this branch just in case it might help, but we haven’t had much time to judge whether this has helped or not.
Thanks in advance for any help.