ansible fails to use renewed ssh certificates with ansible_ssh_executable is used

Hello,

Reposting from https://github.com/ansible/ansible/issues/82808.

We use timebound ssh certificates to connect to target virtual machines. The expiry time is set to 65 minutes. We have written a custom ansible_ssh_executable to generate ssh certificates & set sshconfig so that ansible can use them to talk to target VM.

This all works fine for short running jobs. For long running jobs (over 65 minutes), ansible fails no matter in whichever way I handle the errors in ansible_ssh_executable. I do have a check in there to check for certificate expiry & renew them before making an ssh call. It seems to me that it’s ignoring these & still using old ones (my guess is that it’s stored somewhere in memory?).

I know this an edge case scenario. Hence seeking Ansible community guidance here.

Thanks,
Harsha

The problem seems to stem from the fact that your script is outputting information to stdout, confusing the attempts to determine the remote temp dir:

<1.1.1.1> SSH: EXEC ./az-ssh-wrapper.py -vvv -o ControlMaster=no -o ControlPersist=60s -o ConnectionAttempts=5 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ‘ControlPath=“/Users/openai/.ansible/cp/7166efea50”’ 1.1.1.1 ‘/bin/sh -c ‘"’“‘echo ~ && sleep 0’”’"‘’
<1.1.1.1> (0, b’[\n {\n “cloudName”: “AzureCloud”,\n “homeTenantId”: “abcd-efgh2”,\n “id”: “abcd-efgh”,\n “isDefault”: true,\n “managedByTenants”: [\n {\n “tenantId”: “abcd-efgh”\n }\n ],\n “name”: “rnd-ne-gob”,\n “state”: “Enabled”,\n “tenantId”: “abcd-efgh2”,\n “user”: {\n “name”: “abcd-efgh”,\n “type”: “servicePrincipal”\n }\n },\n {\n “cloudName”: “AzureCloud”,\n “homeTenantId”: “abcd-efgh2”,\n “id”: “abcd-efgh2”,\n “isDefault”: false,\n “managedByTenants”: [\n {\n “tenantId”: “abcd-efgh”\n },\n {\n “tenantId”: “abcd-efgh2”\n }\n ],\n “name”: “Airline-Data-Platform”,\n “state”: “Enabled”,\n “tenantId”: “abcd-efgh2”,\n “user”: {\n “name”: “abcd-efgh”,\n “type”: “servicePrincipal”\n }\n },\n {\n “cloudName”: “AzureCloud”,\n “homeTenantId”: “abcd-efgh2”,\n “id”: “abcd-efgh2”,\n “isDefault”: false,\n “managedByTenants”: [\n {\n “tenantId”: “abcd-efgh2”\n }\n ],\n “name”: “des-tpesafeiac-we”,\n “state”: “Enabled”,\n “tenantId”: “abcd-efgh2”,\n “user”: {\n “name”: “abcd-efgh”,\n “type”: “servicePrincipal”\n }\n }\n]\n/Users/openai/infra-projects/awx-az-creds/az_ssh_config/all_ips/id_rsa already exists.\nOverwrite (y/n)? /home/abcd-efgh\n’, b’WARNING: The command requires the extension ssh. It will be installed first.\nGenerated SSH certificate /Users/openai/infra-projects/awx-az-creds/az_ssh_config/all_ips/id_rsa.pub-aadcert.pub is valid until 2024-03-12 04:59:19 PM in local time.\nWARNING: /Users/openai/infra-projects/awx-az-creds/az_ssh_config/all_ips contains sensitive information (id_rsa, id_rsa.pub, id_rsa.pub-aadcert.pub). Please delete it once you no longer need this config file.\n******************************************************************************************\n* \n \n*****************************************************************************************\n’)

/Users/openai/infra-projects/awx-az-creds/az_ssh_config/all_ips/id_rsa already exists.\nOverwrite (y/n)?

Ultimately, the string Overwrite (y/n)? /home/abcd-efgh is being selected as the remote tmp dir, and then as a result when the mkdir runs, a failure occurs:

mkdir -p “echo Overwrite (y/n)? /home/abcd-efgh/.ansible/tmp

/bin/sh: command substitution: line 0: syntax error near unexpected token (' /bin/sh: command substitution: line 0: echo Overwrite (y/n)? /home/abcd-efgh/.ansible/tmp ’
mkdir: cannot create directory ‘’: No such file or directory

fwiw, there also seems to be a large chunk of JSON being printed as well. You need to attempt to ensure that stdout remains clear of any additional information.

Hi Matt,

Thanks a lot for your response. You are indeed correct, the error is due to spillover of stdout response. I did some changes & it works as expected.

Another issue about json chunk being printed. I could resolve that issue too.

Once again, many thanks…