Project Sync Failures

Hello all,

We did a big upgrade of AWX from 15.0.1 to 23.1.0.
To do so I dumped the DB, created another postgres cluster, database, and user, granting access so the user can access the database.
Afterwards, I used the AWX Operator in OpenShift 4 to create an AWX instance.

After a lot of back and forth I was able to get AWX up and running, I can log into it as I am able to in my old environment (using creds stored in AD).

I’ve tried sending curl requests to add a host to inventory and that works, moving through the UI seems to work, but my projects aren’t syncing reliably.

I’ve seen cases where a sync will succeed, but the majority of sync attempts fail.
The logs aren’t much use:

Here’s the entirety of the logs:

Enter passphrase for /var/tmp/awx_89945_1pdcl45c/artifacts/89945/ssh_key_data:
Identity added: /var/tmp/awx_89945_1pdcl45c/artifacts/89945/ssh_key_data (awx@awxpoc-web-8f97cb7fc-khxrp)

PLAY [Update source tree if necessary] *****************************************

TASK [Update project using git] ************************************************

The logs from the task pod:

2023-09-26 16:06:24,921 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 controller node chosen

2023-09-26 16:06:24,921 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 execution node chosen

2023-09-26 16:06:25,066 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 waiting

2023-09-26 16:06:25,421 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 pre run

2023-09-26 16:06:25,440 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 preparing playbook

2023-09-26 16:06:25,512 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 running playbook

2023-09-26 16:06:25,537 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 work unit id received

2023-09-26 16:06:25,577 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 work unit id assigned

2023-09-26 16:06:25,812 INFO [-] awx.main.wsrelay Producer 10.129.6.253-schedules-changed has no subscribers, shutting down.

2023-09-26 16:06:31,119 INFO [11066f637064479eb13a7e75a49b5e86] awx.main.commands.run_callback_receiver Starting EOF event processing for Job 89949

2023-09-26 16:06:31,123 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 post run

2023-09-26 16:06:31,417 INFO [11066f637064479eb13a7e75a49b5e86] awx.analytics.job_lifecycle projectupdate-89949 finalize run

2023-09-26 16:06:31,423 WARNING [11066f637064479eb13a7e75a49b5e86] awx.main.dispatch project_update 89949 (failed) encountered an error (rc=None), please see task stdout for details.

This is what I see on the web pod:

2023-09-26 16:08:02,219 INFO [afd1ae5ddf624e02a06e123b25ad9b19] awx.analytics.job_lifecycle projectupdate-89950 created

10.130.5.200 - - [26/Sep/2023:16:08:02 +0000] “POST /api/v2/projects/527/update/ HTTP/1.1” 202 2265 “https://awx.apps.ocpazt001.csx.com/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36” “10.92.172.184”

[pid: 244|app: 0|req: 27/3147] 10.130.5.200 () {70 vars in 2450 bytes} [Tue Sep 26 16:08:01 2023] POST /api/v2/projects/527/update/ => generated 2265 bytes in 456 msecs (HTTP/1.1 202) 15 headers in 635 bytes (1 switches on core 0)

If I rsh into the task or web pod, I can run a git clone and it is successful.

I’ve tried creating a project with new creds; however, the issue appears to be the same.

Thanks,

Shawn

Using the Developer Tools from Chrome, I ran a sync and captured the following:

General Headers:

Request URL:
https://awx.apps.ocpazt001.csx.com/api/v2/projects/527/update/

Request Method:
POST

Status Code:
202 Accepted

Remote Address:
100.64.1.69:443

Referrer Policy:
strict-origin-when-cross-origin

Request Headers:

Accept:
application/json, text/plain, /

Accept-Encoding:
gzip, deflate, br

Accept-Language:
en-US,en;q=0.9

Connection:
keep-alive

Content-Length:
0

Cookie:
CFCLIENT_TCIS=issch%3D0%23issm%3D0%23security%5Faccess%3D3%2E1%2C1%2E1%2C12%2E2%2C14%2E1%2C4%2E1%2C5%2E2%2C6%2E1%2C8%2E1%2C10%2E1%2C13%2E1%2C7%2E1%2C9%2E1%23racf%3DJ8683%23company%3D2139%23username%3DSingh%2C%20Radesh%23ruserid%3D94164%23sectaccess%3D1%2C0%2C1%2C1%2C2%2C1%2C1%2C1%2C1%2C1%2C0%2C2%2C1%2C1%2C%23jobtype%3D9%23archive%3Dfalse%23ishd%3D0%23securitysections%3D%23notify%5Frefresh%5Frate%3D300%23orglevel%3D800%23assigngroup%3D4332%2C31232%2C32192%2C19828%2C22569%2C4334%2C32172%2C35652%2C4471%23department%3D2311%23team%3D1441%23eivr%3D%23position%3DJ868301%23; CFGLOBALS=urltoken%3DCFID%23%3D3180505%26CFTOKEN%23%3D47387534%23lastvisit%3D%7Bts%20%272022%2D06%2D21%2012%3A21%3A19%27%7D%23hitcount%3D47%23timecreated%3D%7Bts%20%272022%2D05%2D31%2010%3A58%3A52%27%7D%23cftoken%3D47387534%23cfid%3D3180505%23; _ga=GA1.1.1849331890.1666123379; com.silverpop.iMAWebCookie=e28825bd-04a1-e128-6d28-ec6498017779; _ga_BL8HZZJ5X4=GS1.1.1680100901.2.0.1680100901.0.0.0; _fbp=fb.1.1693919665758.1258355006; _ga_58T88XBVN1=GS1.1.1693933404.10.1.1693933404.60.0.0; 2d18f267facdf29e764fe65056416803=ecddd3bdebb6688bb8aaeffa00169634; userLoggedIn=true; awx_sessionid=1ppd0stpfenm8o1y35s2xb5uqxe5a2ts; csrftoken=NKdsvTELFbhMSumG0aDfJLEw3im0HCDj

Host:
awx.apps.ocpazt001.csx.com

Origin:
https://awx.apps.ocpazt001.csx.com

Referer:
https://awx.apps.ocpazt001.csx.com/

Sec-Ch-Ua:
“Chromium”;v=“116”, “Not)A;Brand”;v=“24”, “Google Chrome”;v=“116”

Sec-Ch-Ua-Mobile:
?0

Sec-Ch-Ua-Platform:
“macOS”

Sec-Fetch-Dest:
empty

Sec-Fetch-Mode:
cors

Sec-Fetch-Site:
same-origin

User-Agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36

X-Csrftoken:
NKdsvTELFbhMSumG0aDfJLEw3im0HCDj

Response Headers:

Access-Control-Expose-Headers:
X-API-Request-Id

Allow:
GET, POST, HEAD, OPTIONS

Content-Language:
en

Content-Length:
2266

Content-Type:
application/json

Date:
Tue, 26 Sep 2023 16:27:02 GMT

Location:
/api/v2/project_updates/89960/

Server:
nginx

Session-Timeout:
7200

Set-Cookie:
awx_sessionid=1ppd0stpfenm8o1y35s2xb5uqxe5a2ts; expires=Tue, 26 Sep 2023 18:27:02 GMT; HttpOnly; Max-Age=7200; Path=/; SameSite=Lax

Vary:
Accept, Accept-Language, Origin, Cookie

X-Api-Node:
awxpoc-web-8f97cb7fc-khxrp

X-Api-Product-Name:
AWX

X-Api-Product-Version:
23.1.0

X-Api-Request-Id:
3a0474daad7d4f6ea3368ab60cd6e50b

X-Api-Time:
0.198s

X-Api-Total-Time:
0.373s

Response:
{
“project_update”: 89960,
“id”: 89960,
“type”: “project_update”,
“url”: “/api/v2/project_updates/89960/”,
“related”: {
“created_by”: “/api/v2/users/82/”,
“modified_by”: “/api/v2/users/82/”,
“credential”: “/api/v2/credentials/72/”,
“unified_job_template”: “/api/v2/projects/527/”,
“stdout”: “/api/v2/project_updates/89960/stdout/”,
“project”: “/api/v2/projects/527/”,
“cancel”: “/api/v2/project_updates/89960/cancel/”,
“scm_inventory_updates”: “/api/v2/project_updates/89960/scm_inventory_updates/”,
“notifications”: “/api/v2/project_updates/89960/notifications/”,
“events”: “/api/v2/project_updates/89960/events/”
},
“summary_fields”: {
“organization”: {
“id”: 7,
“name”: “CPSE”,
“description”: “Echo team”
},
“project”: {
“id”: 527,
“name”: “ap_azure-automation_using_awxnewkey”,
“description”: “New SSH Key”,
“status”: “pending”,
“scm_type”: “git”,
“allow_override”: false
},
“credential”: {
“id”: 72,
“name”: “awxnewkey”,
“description”: “”,
“kind”: “scm”,
“cloud”: false,
“kubernetes”: false,
“credential_type_id”: 2
},
“unified_job_template”: {
“id”: 527,
“name”: “ap_azure-automation_using_awxnewkey”,
“description”: “New SSH Key”,
“unified_job_type”: “project_update”
},
“created_by”: {
“id”: 82,
“username”: “j8683”,
“first_name”: “Radesh”,
“last_name”: “Singh”
},
“modified_by”: {
“id”: 82,
“username”: “j8683”,
“first_name”: “Radesh”,
“last_name”: “Singh”
},
“user_capabilities”: {
“delete”: true,
“start”: true
}
},
“created”: “2023-09-26T16:27:02.231149Z”,
“modified”: “2023-09-26T16:27:02.254825Z”,
“name”: “ap_azure-automation_using_awxnewkey”,
“description”: “New SSH Key”,
“local_path”: “_527__ap_azure_automation_1033183318_am”,
“scm_type”: “git”,
“scm_url”: “git@github.com:csx-technology/ap_azure-automation.git”,
“scm_branch”: “main”,
“scm_refspec”: “”,
“scm_clean”: false,
“scm_track_submodules”: false,
“scm_delete_on_update”: false,
“credential”: 72,
“timeout”: 0,
“scm_revision”: “”,
“unified_job_template”: 527,
“launch_type”: “manual”,
“status”: “pending”,
“execution_environment”: null,
“failed”: false,
“started”: null,
“finished”: null,
“canceled_on”: null,
“elapsed”: 0.0,
“job_args”: “”,
“job_cwd”: “”,
“job_env”: {},
“job_explanation”: “”,
“execution_node”: “”,
“result_traceback”: “”,
“event_processing_finished”: false,
“launched_by”: {
“id”: 82,
“name”: “j8683”,
“type”: “user”,
“url”: “/api/v2/users/82/”
},
“work_unit_id”: null,
“project”: 527,
“job_type”: “check”,
“job_tags”: “update_git,install_roles,install_collections”
}

I’m looking for the link, but I came across a post from a user who upgraded to v22, and seemed to experience an issue syncing projects.
From what I recall, they killed the task pod and was able to sync.
I just did that, and am able to sync some projects.

I don’t know how long it will work before I need to kill the pod again, but at least it appears to be a workaround.

Shawn

ok, I found the link; however, it was for another error I saw when I used Chrome to look at what my browser was seeing when I do a sync:
https://github.com/ansible/awx/issues/13978

At the end, a poster mentioning restarting the task container.

Will update this thread with:

  1. Whether the “bandaid” continues to work.
  2. Additional info.

Shawn

PLAY [Update source tree if necessary] *****************************************

TASK [Update project using git] ************************************************

is that the full stdout you see in the UI for that project update? And when you go re-launch, does it consistently fail on that same task? Also does the project update end in a ‘Failed’ or ‘Error’ status?

AWX Team

I’m sorry, I found a solution.
I by increasing limits and quotas, the issue went away.

Shawn