Issue with Vault Signed SSH in 21.0.0

Has anyone using Vault Signed SSH Machine credentials seen an issue with 21.0.0? I have an instance running 20.1.0 that works fine but just deployed a new instance running 21.0.0 and when executing a job using the signed Machine credentials I do not see the “Certificate added: …” line indicating the key was signed by Vault and the connection is refused. I know that Vault itself is properly signing the keys as I can use the vault ssh CLI and it works fine.

Replying to add additional details from my troubleshooting…

On my v20.1.0 instance using the same custom EE image if I run one of my simple templates against a test box the first few lines of the job output are:


Enter passphrase for /runner/artifacts/330395/ssh_key_data:

Identity added: /runner/artifacts/330395/ssh_key_data (jbouse@jbouse-MBP16.lan)

Certificate added: /runner/artifacts/330395/ssh_key_data-cert.pub (vault-approle-7b0ecb9e24638474813385e9f848a40ae9a40f055bde812d8b4530c5c6433cea)

/usr/local/lib/python3.8/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated

"class": algorithms.Blowfish,

No config file found; using defaults

As I’ve come to expect when using signed ssh machine credentials. It enters the passphrase to unlock the private key, adds the private key identity and then adds the signed certificate key returned from Vault. However on my v21.0.0 instance using the same custom EE image when running the same template against a test box the output is just:


Enter passphrase for /runner/artifacts/195/ssh_key_data:

Identity added: /runner/artifacts/195/ssh_key_data (jbouse@jbouse-MBP16.lan)

/usr/local/lib/python3.8/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated

"class": algorithms.Blowfish,

No config file found; using defaults

It appears like it isn’t even attempting to sign the key by Vault as I don’t see the “Certificate added: …” line in the output and even if I turn up the logging verbosity to the max I don’t see any indication of a failed signing attempt. I know that both the approle login and policy work properly to sign the key as I’ve tried creating a payload.json file with the role_id and secret_id that I then ran the following to validate:

VAULT_ADDR=https://vault.example.com \
VAULT_TOKEN=$(curl -s --request POST --data @payload.json https://vault.example.com/v1/auth/approle/login | jq -r .auth.client_token) \
vault ssh -mode=ca -role=ssh-client-signer -private-key-path=path/to/key -public-key-path=path/to/key.pub user@test.example.com

I’m prompted for the passphrase but I’m then connected to the server and the only way I am able to be connected is with the signed key as there is no other keys configured in the ~/.ssh/authorized_keys file for the user on the server.

I’ve compared the awx/main/credential_plugins/hashivault.py code between the two versions and can’t see anything different between the two that would be causing this behavior. I just rebuilt the custom EE image with the latest version and have it used on both instances of AWX so there is no difference there. Running out of ideas of where to look for the breakage and hoping someone else has experienced this and either has a fix or we can get a fix quickly.

I believe there was a regression around this related to some rework of the task manager. https://github.com/ansible/awx/pull/12122
This fix was included in the release from today; please try this out and let us know if this resolves your issue.

-John

Thanks John… I had gone ahead and opened https://github.com/ansible/awx/issues/12311 and helped validate the 21.1.0 release fix. I’ve now updated all 3 of my AWX deployments to this release.