Can't run SSH against an network device (Extreme SLX switch)

just to clarify: it’s the delegate_to: localhost keyword what tells ansible to install the pip package on the controller instead of into the nodes

2 Likes

@jbericat @ptn

Thanks a lot for your assistance

I think I have to configure 'import_modules ’ as the default one, which is True, in order to make a bit more progress.

Also I have followed you guys suggestion to change this for now in my playbook to install ansible-pylibssh

 - name: Install python package
          ansible.builtin.pip:
            name: ansible-pylibssh
          delegate_to: localhost

Now at least I can see the TCP/SSH packets coming from my AWX to my jump host, but still the playbook is not running success to get what I want. From the logs below, I can see that the correct SSH module (ansible-pylibssh) is used, also the correct Ansible module (community.network.slxos) used. I attach the details logs as below

ansible-playbook [core 2.15.5]
  config file = /runner/project/ansible.cfg
  configured module search path = ['/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /runner/requirements_collections:/runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible-playbook
  python version = 3.9.17 (main, Aug  9 2023, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True
Using /runner/project/ansible.cfg as config file
SSH password: 
setting up inventory plugins
Loading collection ansible.builtin from 
host_list declined parsing /runner/inventory/hosts as it did not pass its verify_file() method
Parsed /runner/inventory/hosts inventory source with script plugin
Loading collection community.network from /runner/requirements_collections/ansible_collections/community/network
Loading callback plugin default of type stdout, v2.0 from /usr/local/lib/python3.9/site-packages/ansible/plugins/callback/default.py
Loading callback plugin awx_display of type stdout, v2.0 from /usr/local/lib/python3.9/site-packages/ansible_runner/display_callback/callback/awx_display.py
Skipping callback 'awx_display', as we already have a stdout callback.
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.

PLAYBOOK: show_version.yml *****************************************************
Positional arguments: playbooks/platform/show_version.yml
verbosity: 4
remote_user: svc_opstools
connection: smart
timeout: 10
ask_pass: True
become_method: sudo
tags: ('all',)
inventory: ('/runner/inventory/hosts',)
subset: test-host.test.net
extra_vars: ('@/runner/env/extravars',)
forks: 5
1 plays in playbooks/platform/show_version.yml

PLAY [OpsTools - Show version] *************************************************

TASK [Install python package] **************************************************
task path: /runner/project/playbooks/platform/show_version.yml:12
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: 1000
<localhost> EXEC /bin/sh -c 'echo ~1000 && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /runner/.ansible/tmp `"&& mkdir "` echo /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098 `" && echo ansible-tmp-1697790179.6617095-22-179335887244098="` echo /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098 `" ) && sleep 0'
Using module file /usr/local/lib/python3.9/site-packages/ansible/modules/pip.py
<localhost> PUT /runner/.ansible/tmp/ansible-local-17syeyrc1a/tmp3omum6xs TO /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/AnsiballZ_pip.py
<localhost> EXEC /bin/sh -c 'chmod u+x /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/ /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/AnsiballZ_pip.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python3 /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/AnsiballZ_pip.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/ > /dev/null 2>&1 && sleep 0'
changed: [test-host.test.net -> localhost] => {
    "changed": true,
    "cmd": [
        "/usr/bin/python3",
        "-m",
        "pip.__main__",
        "install",
        "ansible-pylibssh"
    ],
    "invocation": {
        "module_args": {
            "chdir": null,
            "editable": false,
            "executable": null,
            "extra_args": null,
            "name": [
                "ansible-pylibssh"
            ],
            "requirements": null,
            "state": "present",
            "umask": null,
            "version": null,
            "virtualenv": null,
            "virtualenv_command": "virtualenv",
            "virtualenv_python": null,
            "virtualenv_site_packages": false
        }
    },
    "name": [
        "ansible-pylibssh"
    ],
    "requirements": null,
    "state": "present",
    "stderr": "",
    "stderr_lines": [],
    "stdout": "Defaulting to user installation because normal site-packages is not writeable\\nCollecting ansible-pylibssh\\n  Downloading ansible_pylibssh-1.1.0-cp39-cp39-manylinux_2_24_x86_64.whl (2.3 MB)\\n     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 30.9 MB/s eta 0:00:00\\nInstalling collected packages: ansible-pylibssh\\nSuccessfully installed ansible-pylibssh-1.1.0\\n",
    "stdout_lines": [
        "Defaulting to user installation because normal site-packages is not writeable",
        "Collecting ansible-pylibssh",
        "  Downloading ansible_pylibssh-1.1.0-cp39-cp39-manylinux_2_24_x86_64.whl (2.3 MB)",
        "     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 30.9 MB/s eta 0:00:00",
        "Installing collected packages: ansible-pylibssh",
        "Successfully installed ansible-pylibssh-1.1.0"
    ],
    "version": null,
    "virtualenv": null
}

TASK [Run show version on remote devices] **************************************
task path: /runner/project/playbooks/platform/show_version.yml:17
redirecting (type: connection) ansible.builtin.network_cli to ansible.netcommon.network_cli
Loading collection ansible.netcommon from /runner/requirements_collections/ansible_collections/ansible/netcommon
Loading collection ansible.utils from /runner/requirements_collections/ansible_collections/ansible/utils
redirecting (type: terminal) ansible.builtin.slxos to community.network.slxos
redirecting (type: cliconf) ansible.builtin.slxos to community.network.slxos
<test-host.test.net> attempting to start connection
<test-host.test.net> using connection plugin ansible.netcommon.network_cli
Found ansible-connection at path /usr/local/bin/ansible-connection
<test-host.test.net> local domain socket does not exist, starting it
<test-host.test.net> control socket path is /runner/.ansible/pc/c680fdf726
<test-host.test.net> Loading collection ansible.builtin from 
<test-host.test.net> redirecting (type: connection) ansible.builtin.network_cli to ansible.netcommon.network_cli
<test-host.test.net> Loading collection ansible.netcommon from /runner/requirements_collections/ansible_collections/ansible/netcommon
<test-host.test.net> Loading collection ansible.utils from /runner/requirements_collections/ansible_collections/ansible/utils
<test-host.test.net> redirecting (type: terminal) ansible.builtin.slxos to community.network.slxos
<test-host.test.net> Loading collection community.network from /runner/requirements_collections/ansible_collections/community/network
<test-host.test.net> redirecting (type: cliconf) ansible.builtin.slxos to community.network.slxos
<test-host.test.net> local domain socket listeners started successfully
<test-host.test.net> loaded cliconf plugin ansible_collections.community.network.plugins.cliconf.slxos from path /runner/requirements_collections/ansible_collections/community/network/plugins/cliconf/slxos.py for network_os slxos
<test-host.test.net> ssh type is set to auto
<test-host.test.net> autodetecting ssh_type
<test-host.test.net> ssh type is now set to libssh
<test-host.test.net> Loading collection ansible.builtin from 
<test-host.test.net> local domain socket path is /runner/.ansible/pc/c680fdf726
<test-host.test.net> Using network group action slxos for slxos_command
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: enabled
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: found slxos_command  at /runner/requirements_collections/ansible_collections/community/network/plugins/modules/slxos_command.py
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: running slxos_command
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: complete
fatal: [test-host.test.net]: FAILED! => {
    "changed": false,
    "module_stderr": "ssh connection failed: ssh connect failed: Socket error: Connection reset by peer",
    "module_stdout": "",
    "msg": "MODULE FAILURE\\nSee stdout/stderr for the exact error"
}
...ignoring

I have checked logs in detail but could not find anything useful that can help me with further troubleshooting

I have also captured the packets on my jump host but the packets just show TCP 3 way hand shake is success and the SSH key negotiation process and does not give me any hint

FYI, all my other playbooks against Linux servers are via the same jump host, and they all run with success

Any hint or suggestion are welcome, I am really running out of clue

1 Like

@jbericat @ptn

And FYI, when i tried this directly from AWX container to my Extreme network switch, the SSH is success

 ssh -vvv -F ./ssh.cfg -o StrictHostKeyChecking=no -o 'User="svc_opstools"' -o ConnectTimeout=10 -o 'ProxyCommand=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -q noc@jump.test.net' -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no test-host.test.net
1 Like

Hello @mapleos1123 , I’m just reading the whole thread again to see if we’re missing something on the way. At first sight what I notice is that you’re not using fqcn’s on your tasks:

Let’s discard this before troubleshooting further. Could you try it please? This will assure us you’re using the correct / latest collection version (not the first time I fix an issue just by specifying the correct collection version this way):

- name: OpsTools - Show version
  hosts: PE, P
  gather_facts: no
  connection: network_cli
  collections:
    - community.network
  tasks:
    - block:
        - name: Run show version on remote devices
          community.network.slxos_command:
            commands: show version
          when:
            - (inventory_hostname in groups['SLX'])
          changed_when: false
          ignore_errors: true
          no_log: false
          register: output_slx
    
        - name: Results [SLX]
          ansible.builtin.debug:
            msg: "{{ output_slx.stdout_lines[0] }}"
          when: output_slx.stdout_lines[0] is defined
    
        - name: show version [MLX]
          community.network.ironware_command:
            commands: show version
          when:
            - (inventory_hostname in groups['MLX'])
          changed_when: false
          ignore_errors: true
          no_log: true
          register: output_mlx
    
        - name: Results [MLX]
          ansible.builtin.debug:
            msg: "{{ output_mlx.stdout_lines[0] }}"
          when: output_mlx.stdout_lines[0] is defined

PS: Using FQCN is the way to go since ansible-core 2.9 so I’d suggest you using / installing ansible-linter so you will be notified of those ‘good practices’ tips during implementation on vscode

PS2: I was editing this post aaaaand deleted this by mistake → You could also check if you’re running the latest version of the collection:

ansible-galaxy collection install community.network --force

Cheers!

I just noticed this on the “community.network” collection repo;

I believe I’ve seen you’re using ansible-core == 2.15, right? This one is not supported / not been tested for the collection. Can you try it on 2.13, please?

EDIT: Updated info

1 Like

@jbericat

thanks, can you please share with me how i can change this setting “ansible-core == 2.15”?

hey @mapleos1123

In AWX 23.0.0 You can choose different EE’s on Administration > Execution Environments, so you can easily switch among different ansible-core versions, galaxy collections and even python packages. If you don’t have one that provides ansible-core < 2.15, then you can follow the instruction @TheRealHaoLiu gave you several posts ago to customize yours:

You may also find this thread useful:

2 Likes

Hey,

ssh -vvv -F ./ssh.cfg -o StrictHostKeyChecking=no -o 'User="svc_opstools"' -o ConnectTimeout=10 -o 'ProxyCommand=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -q noc@jump.test.net' -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no test-host.test.net

Are you using the same ssh options when running your playbook form AWX ? I’m thinking they differ in some way, so better compare them.

Socket error: Connection reset by peer

Usually means the remote server drops your connection. In this case, it might be the bastion or the remote node. Could you check both machines sshd logs ?

I’m also wondering about timeout values; I commented your config a few days ago; have you looked into it ?

ansible_persistent_command_timeout: 300 # This key doesn’t exists, you either use envvar ANSIBLE_PERSISTENT_COMMAND_TIMEOUT or command_timeout key (under [persistent_connection] section from ansible.cfg); see: Ansible Configuration Settings — Ansible Documentation

2 Likes

@ptn

Thanks a lot for your time. I am getting more and more confused now, it seems indeed my playbook is using different SSH options than the SSH command I directly used in K8S containers

From the SSH logs of my jump host, the reason of the SSH issue in this playbook seems that the playbook is trying to use ssh account/password login, instead of the SSH public key file login

But I don’t know how I can change it.

I have in my ansible.cfg

[ssh_connection]

# ssh arguments to use
# Leaving off ControlPersist will result in poor performance, so use
# paramiko on older platforms rather than removing it, -C controls compression use
#ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s
#ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
ansible_connection = ssh
ansible_ssh_common_args = '-o ProxyCommand="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -q noc@test.test.net" -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'
# ssh_args = -F ./ssh.cfg -o ControlMaster=auto -o ControlPersist=30m

ssh_args = -F ./ssh.cfg

I have in my ssh.cfg

StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

Host *
  ProxyCommand ssh -W %h:%p noc@test.test.net
  User noc
  # point to the local authorized key
  IdentityFile ~/projects/.ssh/id_ed25519

Host test.test.net
  Hostname test.test.net
  User noc
  # point to the local authorized key
  IdentityFile ~/projects/.ssh/id_ed25519
#   ControlMaster auto
#   ControlPath ~/.ssh/%r@%h:%p
#   ControlPersist 5m

And in my playbook associated credential, I am using ‘Credential type’ as ‘Machine’ and I filled in the user name as ‘svc_opstools’ and its password.

So I think Ansible should use the SSH public key login for its SSH session towards my jump host (test.test.net in the above context), then use the SSH account/password login (with username svc_opstools) to further log into my SLX switch in this playbook

Any suggestion how I shall change it ?

Thanks

@ptn

I finally made it work, but I am not sure how it exactly worked, especially why it worked on my AWX 9 before but with the same configuration it did not work on my AWX 23.0

What I changed is just adding my SSH private key into the Credential of the related playbook

Previously this credential is configured as ‘Credential type’ as ‘Machine’ and I filled in the user name as ‘svc_opstools’ and its password. Now I just add my SSH private key into ‘SSH Private Key’ section, see my attachment

So it seems my .ssh configuration file is not even used at all ?

1 Like

Hey,

Glad to see you fixed your issue :slight_smile:.

First off, I don’t know a thing about AWX credentials management as I don’t use it. But I remember from your first post you were using paramiko instead of openssh, and paramiko doesn’t use openssh client config.

Now I’m not sure if you finally installed missing packages to use openssh instead, but I remember there were some misconfiguration on your Ansible config in general. I’m not motivated enough to go through all of this thread’s history right now, but there should be all the info you need to understand what goes wrong. If that doesn’t help, allow me a few days or weeks to get back to it as I’m pretty busy these days :confused:.