Issue deploying playbook through GCP IAP behind a bastion host using ssh config and ProxyCommand

TL;DR

We have an ssh configuration issue that prevents us from deploying using ansible-playbook through GCP Identity Aware Proxy to an internal host behind a bastion host. We have requested help from GCP but we are reaching a point where they can’t help us anymore and suggested we contact you here.

Long form

In order to improve our security posture, we have enabled Identity Aware Proxy against our Compute Engine VMs (for SSH connections).
As a result, we are no longer able to deploy using ansible-playbook, as ansible ssh is unable to connect to the hosts to deploy to.

Infrastructure

Our infrastructure is as follows:

[project-a]  | [project-b]
-------------|-------------
bastion-host | backend-host

Where:

  • bastion-host is the only host that is SSH-accessible from public internet, but is protected with IAP;
  • backend-host is SSH-accessible internally via VPC-peering between project-a and project-b, but is protected internally protected with IAP.

Prior to enabling IAP, we were able to connect to backend-host without any issue using ssh-add ~/.ssh/google_compute_engine to forward our ssh key through the tunnel; not anymore.

We want to deploy to backend-host using ansible-playbook.
Given the structure of the network, we need to ssh-hop through bastion-host first to reach backend-host.

Ansible setup

Our ansible is setup as follows:

ansible.cfg

[ssh_connection]
ssh_args = -F ssh.config -C -o ControlMaster=auto -o ControlPersist=360s -o ConnectTimeout=30

ssh.config

Host bastion.example.com
   HostName <PUBLIC IP>
   IdentityFile ~/.ssh/google_compute_engine
   ServerAliveInterval 60
   ProxyCommand gcloud compute ssh bastion --project "project-a" --zone "northamerica-northeast1-a" --tunnel-through-iap --impersonate-service-account="gce-bastion-prod@project-a.iam.gserviceaccount.com"
   StrictHostKeyChecking no
   UserKnownHostsFile=/dev/null
   RequestTTY force                   # Force TTY allocation for interactive sessions
   LogLevel DEBUG3

Host backend.example.com
   HostName backend.example.com
   ProxyJump bastion.example.com
   ProxyCommand gcloud compute ssh backend --project "project-b" --zone "northamerica-northeast1-a" --impersonate-service-account="gce-backend-prod@project-b.iam.gserviceaccount.com"
   StrictHostKeyChecking no
   ServerAliveInterval 60
   UserKnownHostsFile=/dev/null
   RequestTTY force                   # Force TTY allocation for interactive sessions
   LogLevel DEBUG3

Important note

This issue is regarding the ssh.config and how to configure ansible to connect to the backend-host; the above proxy commands, when executed from the command line, successfully results in logging in to the backend-host.

Error & logs

When running

ansible-playbook Playbook.yml -l bastion -e environ=production

The ProxyCommand for host bastion.example.com is executed, and the connection is established, but then errors out:

...
WARNING: This command is using service account impersonation. All API calls will be executed as [gce-bastion-prod@project-a.iam.gserviceaccount.com].
debug1: kex_exchange_identification: banner line 0: Linux bastion 4.19.0-27-cloud-amd64 #1 SMP Debian 4.19.316-1 (2024-06-25) x86_64
debug1: kex_exchange_identification: banner line 1: 
debug1: kex_exchange_identification: banner line 2: The programs included with the Debian GNU/Linux system are free software;
debug1: kex_exchange_identification: banner line 3: the exact distribution terms for each program are described in the
debug1: kex_exchange_identification: banner line 4: individual files in /usr/share/doc/*/copyright.
debug1: kex_exchange_identification: banner line 5: 
debug1: kex_exchange_identification: banner line 6: Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
debug1: kex_exchange_identification: banner line 7: permitted by applicable law.
-bash: line 1: $'SSH-2.0-OpenSSH_9.6\\r': command not found
Connection timed out during banner exchange
Connection to UNKNOWN port 65535 timed out

I can’t seem to figure it out; I’ve checked the internet for proposed solutions to $'SSH-2.0-OpenSSH_9.6\\r': command not found; Connection timed out during banner exchange and they didn’t apply to my issue. There were also alternative, more complex answers that I have found online:

[1] google cloud platform - Ansible GCP IAP tunnel - Stack Overflow
[2] ssh - With Ansible, is it possible to connect connect to hosts that are behind Cloud IAP (Identity-Aware Proxy) in GCP? - Unix & Linux Stack Exchange

but to be honest, I’m afraid to introduce complexity that I don’t understand and as a result if each solution doesn’t work right away, I won’t know what’s wrong, and it’ll be like throwing darts in the dark :confused:

Can you help me out with this SSH configuration issue? :pray:
Thank you very much :bowing_man:

I remember seeing something similar to this, which turned out to be caused by banners that confused things downstream.
Try experimenting with ‘–quiet’ in the ‘gcloud compute ssh’ stanzas.

Dick

Hi Dick!

Thanks for your answer!

If it is that simple, I will be really happy. Unfortunately at this time, our VM instance is on an older OpenSSH version prior to -q actually removing the banners (tried --quiet, -- -q, --quiet -- -q to no effect).

I will have to upgrade the OS before I can upgrade the version of OpenSSH. I’ll get on it and let you know when it’s done. Expect an answer by Tue Oct 22nd, 2024, 9PM EDT.

Have a great week-end!
Philippe

Hi Dick,

After upgrading all packages on the bastion host and upgrading to OpenSSH_9.2p1 Debian-2+deb12u3, OpenSSL 3.0.14 4 Jun 2024, I have attempted the following configurations to no avail:

  • --quiet in gcloud compute ssh stanzas
  • -q in flags passed to ssh by gcloud compute ssh after --
  • -q in [ssh_connection] ssh_args config point in ansible.cfg
  • -o LogLevel=error in [ssh_connection] ssh_args config point in ansible.cfg
  • -o LogLevel=error in flags passed to ssh by gcloud compute ssh after --

In all cases the banner is still present.
Furthermore I modified /etc/ssh/ssh_config to add

Host *
  ...
  Banner none

and restarted ssh.service using sudo systemctl restart ssh.service, and the Banner is still there.

Is there anything else I could do to attempt to remove the banner?
Any other possible source of the error you can remember?


On another note, I’m considering implementing the solution proposed on this SO answer. I won’t be working on this in the next few days, but I’ll keep you updated.

Cheers,
Philippe

(using email, so I’m quoting the original thread)

| philippegoarthurai Philippe Hebert |

  • | - |

ansible.cfg

[ssh_connection]
ssh_args = -F ssh.config -C -o ControlMaster=auto -o ControlPersist=360s -o ConnectTimeout=30

ssh.config

Host [bastion.example.com](http://bastion.example.com)
   HostName <PUBLIC IP>
   IdentityFile ~/.ssh/google_compute_engine
   ServerAliveInterval 60
   ProxyCommand gcloud compute ssh bastion --project "project-a" --zone "northamerica-northeast1-a" --tunnel-through-iap --impersonate-service-account="[gce-bastion-prod@project-a.iam.gserviceaccount.com](mailto:gce-bastion-prod@project-a.iam.gserviceaccount.com)"
   StrictHostKeyChecking no
   UserKnownHostsFile=/dev/null
   RequestTTY force                   # Force TTY allocation for interactive sessions
   LogLevel DEBUG3

Host [backend.example.com](http://backend.example.com)
   HostName [backend.example.com](http://backend.example.com)
   ProxyJump [bastion.example.com](http://bastion.example.com)
   ProxyCommand gcloud compute ssh backend --project "project-b" --zone "northamerica-northeast1-a" --impersonate-service-account="[gce-backend-prod@project-b.iam.gserviceaccount.com](mailto:gce-backend-prod@project-b.iam.gserviceaccount.com)"
   StrictHostKeyChecking no
   ServerAliveInterval 60
   UserKnownHostsFile=/dev/null
   RequestTTY force                   # Force TTY allocation for interactive sessions
   LogLevel DEBUG3

Another idea attempt:

Remove all LogLevel stanzas for the Hosts, and instead add this at the very end of the file:

Host *
LogLevel QUIET

Dick

Hi there!

Sorry for the late answer, was on vacations. I found a solution!
Turns out the issue is not about the banner or logs - it’s because the ssh connection is not forwarded back to the parent process. In order to fix this, the command must pass the flag -W %h:%p to the ssh connection.
Below is a complete solution to connect to an internal host through a bastion host when protected with Google Cloud Identity Aware Proxy:

ssh command


ssh -F "./ssh.config" -C -o ControlMaster=auto -o ControlPersist=20 -o PreferredAuthentications=publickey -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no -o ConnectTimeout=20 orca-postgrest

ssh.config file

Host *
  ControlPath ~/.ssh/sockets/%r@%h-%p

Host bastion.example.com
  IdentityFile ~/.ssh/google_compute_engine
  ProxyCommand gcloud compute ssh bastion --project="project-a" --zone=northamerica-northeast1-a --tunnel-through-iap -- -W %h:%p
  UserKnownHostsFile ~/.ssh/google_compute_known_hosts
  RequestTTY force

Host backend.example.com
  HostName <INTERNAL IP ADDRESS>
  IdentityFile ~/.ssh/google_compute_engine
  ProxyJump bastion.example.com
  RemoteCommand gcloud compute ssh backend --project="project-b" --zone=northamerica-northeast1-a
  RequestTTY force

Explanations

  • The flags -o ControlMaster=auto -o ControlPersist=20, -o ControlPath="/tmp/ansible-ssh-orca-postgrest-22-iap" passed to SSH as well as the Host * ControlPath ~/.ssh/sockets/%r@%h-%p configuration in the ssh.config file are used to improve the speed of the connection, as explained in the previously mentioned StackExchange post.

  • The flags -o PreferredAuthentications=publickey -o KbdInteractiveAuthentication=no -o PasswordAuthentication=no passed to SSH are used to force publickey authentication over user:pass authentication, which is the primary mean of connection by the gcloud compute ssh command.

  • Connecting to the bastion is relatively simple - by passing the ProxyCommand above, ssh establishes the connection using the gcloud compute ssh command, which allows to enable tunnelling through IAP. The -W %h:%p flags are passed directly back to ssh as a way to forward traffic back to the parent process. Removing this prevents ssh from recuperating the socket and results in -bash: line 1: $'SSH-2.0-OpenSSH_9.6\r': command not found, which hangs the connection. For a reason I don’t fully understand, HostName <EXTERNAL IP> is not necessary for connecting to the bastion from the local computer. It seems like this is only necessary when being in a intermediate node when proxy jumping.

  • Connecting to the backend host is a bit more complex; First ssh must connect to the bastion host as a proxy jump for the backend host. Once on the server, ssh requires to open the socket to backend before running the remote command on bastion. This was one of the main roadblock, as it was not possible to run the RemoteCommand before opening the connection to backend and it was not possible to specify HostName backend directly, as the name backend is only recognized through the gcloud compute ssh command (the host cannot be discovered using its Google Cloud Engine VM instance name in the network, unless an alias is created on the machine in /etc/hosts or in the network internal dns). As such, by specifying the HostName , ssh can obtain the socket, after which the RemoteCommand is executed. The RemoteCommand connects to the backend host from the bastion host, and then the RequestTTY force config enforces that a TTY be returned to the caller. Without RequestTTY force, the connection is established, but it is impossible to send commands to the remote backend host.


There you go!

I’ll still have to attempt this with ansible itself, but from what I gathered by replacing which ssh ansible uses through the ssh_connection.ssh_executable config point with a bash script that adds the flags used herein, the result should be the same. I’ll keep you in the loop.

Cheers!
Philippe

1 Like