just to clarify: it’s the delegate_to: localhost
keyword what tells ansible to install the pip package on the controller instead of into the nodes
Thanks a lot for your assistance
I think I have to configure 'import_modules ’ as the default one, which is True, in order to make a bit more progress.
Also I have followed you guys suggestion to change this for now in my playbook to install ansible-pylibssh
- name: Install python package
ansible.builtin.pip:
name: ansible-pylibssh
delegate_to: localhost
Now at least I can see the TCP/SSH packets coming from my AWX to my jump host, but still the playbook is not running success to get what I want. From the logs below, I can see that the correct SSH module (ansible-pylibssh) is used, also the correct Ansible module (community.network.slxos) used. I attach the details logs as below
ansible-playbook [core 2.15.5]
config file = /runner/project/ansible.cfg
configured module search path = ['/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
ansible collection location = /runner/requirements_collections:/runner/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible-playbook
python version = 3.9.17 (main, Aug 9 2023, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
Using /runner/project/ansible.cfg as config file
SSH password:
setting up inventory plugins
Loading collection ansible.builtin from
host_list declined parsing /runner/inventory/hosts as it did not pass its verify_file() method
Parsed /runner/inventory/hosts inventory source with script plugin
Loading collection community.network from /runner/requirements_collections/ansible_collections/community/network
Loading callback plugin default of type stdout, v2.0 from /usr/local/lib/python3.9/site-packages/ansible/plugins/callback/default.py
Loading callback plugin awx_display of type stdout, v2.0 from /usr/local/lib/python3.9/site-packages/ansible_runner/display_callback/callback/awx_display.py
Skipping callback 'awx_display', as we already have a stdout callback.
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
PLAYBOOK: show_version.yml *****************************************************
Positional arguments: playbooks/platform/show_version.yml
verbosity: 4
remote_user: svc_opstools
connection: smart
timeout: 10
ask_pass: True
become_method: sudo
tags: ('all',)
inventory: ('/runner/inventory/hosts',)
subset: test-host.test.net
extra_vars: ('@/runner/env/extravars',)
forks: 5
1 plays in playbooks/platform/show_version.yml
PLAY [OpsTools - Show version] *************************************************
TASK [Install python package] **************************************************
task path: /runner/project/playbooks/platform/show_version.yml:12
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: 1000
<localhost> EXEC /bin/sh -c 'echo ~1000 && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /runner/.ansible/tmp `"&& mkdir "` echo /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098 `" && echo ansible-tmp-1697790179.6617095-22-179335887244098="` echo /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098 `" ) && sleep 0'
Using module file /usr/local/lib/python3.9/site-packages/ansible/modules/pip.py
<localhost> PUT /runner/.ansible/tmp/ansible-local-17syeyrc1a/tmp3omum6xs TO /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/AnsiballZ_pip.py
<localhost> EXEC /bin/sh -c 'chmod u+x /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/ /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/AnsiballZ_pip.py && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python3 /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/AnsiballZ_pip.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /runner/.ansible/tmp/ansible-tmp-1697790179.6617095-22-179335887244098/ > /dev/null 2>&1 && sleep 0'
changed: [test-host.test.net -> localhost] => {
"changed": true,
"cmd": [
"/usr/bin/python3",
"-m",
"pip.__main__",
"install",
"ansible-pylibssh"
],
"invocation": {
"module_args": {
"chdir": null,
"editable": false,
"executable": null,
"extra_args": null,
"name": [
"ansible-pylibssh"
],
"requirements": null,
"state": "present",
"umask": null,
"version": null,
"virtualenv": null,
"virtualenv_command": "virtualenv",
"virtualenv_python": null,
"virtualenv_site_packages": false
}
},
"name": [
"ansible-pylibssh"
],
"requirements": null,
"state": "present",
"stderr": "",
"stderr_lines": [],
"stdout": "Defaulting to user installation because normal site-packages is not writeable\\nCollecting ansible-pylibssh\\n Downloading ansible_pylibssh-1.1.0-cp39-cp39-manylinux_2_24_x86_64.whl (2.3 MB)\\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 30.9 MB/s eta 0:00:00\\nInstalling collected packages: ansible-pylibssh\\nSuccessfully installed ansible-pylibssh-1.1.0\\n",
"stdout_lines": [
"Defaulting to user installation because normal site-packages is not writeable",
"Collecting ansible-pylibssh",
" Downloading ansible_pylibssh-1.1.0-cp39-cp39-manylinux_2_24_x86_64.whl (2.3 MB)",
" ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 30.9 MB/s eta 0:00:00",
"Installing collected packages: ansible-pylibssh",
"Successfully installed ansible-pylibssh-1.1.0"
],
"version": null,
"virtualenv": null
}
TASK [Run show version on remote devices] **************************************
task path: /runner/project/playbooks/platform/show_version.yml:17
redirecting (type: connection) ansible.builtin.network_cli to ansible.netcommon.network_cli
Loading collection ansible.netcommon from /runner/requirements_collections/ansible_collections/ansible/netcommon
Loading collection ansible.utils from /runner/requirements_collections/ansible_collections/ansible/utils
redirecting (type: terminal) ansible.builtin.slxos to community.network.slxos
redirecting (type: cliconf) ansible.builtin.slxos to community.network.slxos
<test-host.test.net> attempting to start connection
<test-host.test.net> using connection plugin ansible.netcommon.network_cli
Found ansible-connection at path /usr/local/bin/ansible-connection
<test-host.test.net> local domain socket does not exist, starting it
<test-host.test.net> control socket path is /runner/.ansible/pc/c680fdf726
<test-host.test.net> Loading collection ansible.builtin from
<test-host.test.net> redirecting (type: connection) ansible.builtin.network_cli to ansible.netcommon.network_cli
<test-host.test.net> Loading collection ansible.netcommon from /runner/requirements_collections/ansible_collections/ansible/netcommon
<test-host.test.net> Loading collection ansible.utils from /runner/requirements_collections/ansible_collections/ansible/utils
<test-host.test.net> redirecting (type: terminal) ansible.builtin.slxos to community.network.slxos
<test-host.test.net> Loading collection community.network from /runner/requirements_collections/ansible_collections/community/network
<test-host.test.net> redirecting (type: cliconf) ansible.builtin.slxos to community.network.slxos
<test-host.test.net> local domain socket listeners started successfully
<test-host.test.net> loaded cliconf plugin ansible_collections.community.network.plugins.cliconf.slxos from path /runner/requirements_collections/ansible_collections/community/network/plugins/cliconf/slxos.py for network_os slxos
<test-host.test.net> ssh type is set to auto
<test-host.test.net> autodetecting ssh_type
<test-host.test.net> ssh type is now set to libssh
<test-host.test.net> Loading collection ansible.builtin from
<test-host.test.net> local domain socket path is /runner/.ansible/pc/c680fdf726
<test-host.test.net> Using network group action slxos for slxos_command
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: enabled
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: found slxos_command at /runner/requirements_collections/ansible_collections/community/network/plugins/modules/slxos_command.py
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: running slxos_command
<test-host.test.net> ANSIBLE_NETWORK_IMPORT_MODULES: complete
fatal: [test-host.test.net]: FAILED! => {
"changed": false,
"module_stderr": "ssh connection failed: ssh connect failed: Socket error: Connection reset by peer",
"module_stdout": "",
"msg": "MODULE FAILURE\\nSee stdout/stderr for the exact error"
}
...ignoring
I have checked logs in detail but could not find anything useful that can help me with further troubleshooting
I have also captured the packets on my jump host but the packets just show TCP 3 way hand shake is success and the SSH key negotiation process and does not give me any hint
FYI, all my other playbooks against Linux servers are via the same jump host, and they all run with success
Any hint or suggestion are welcome, I am really running out of clue
And FYI, when i tried this directly from AWX container to my Extreme network switch, the SSH is success
ssh -vvv -F ./ssh.cfg -o StrictHostKeyChecking=no -o 'User="svc_opstools"' -o ConnectTimeout=10 -o 'ProxyCommand=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -q noc@jump.test.net' -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no test-host.test.net
Hello @mapleos1123 , I’m just reading the whole thread again to see if we’re missing something on the way. At first sight what I notice is that you’re not using fqcn’s on your tasks:
Let’s discard this before troubleshooting further. Could you try it please? This will assure us you’re using the correct / latest collection version (not the first time I fix an issue just by specifying the correct collection version this way):
- name: OpsTools - Show version
hosts: PE, P
gather_facts: no
connection: network_cli
collections:
- community.network
tasks:
- block:
- name: Run show version on remote devices
community.network.slxos_command:
commands: show version
when:
- (inventory_hostname in groups['SLX'])
changed_when: false
ignore_errors: true
no_log: false
register: output_slx
- name: Results [SLX]
ansible.builtin.debug:
msg: "{{ output_slx.stdout_lines[0] }}"
when: output_slx.stdout_lines[0] is defined
- name: show version [MLX]
community.network.ironware_command:
commands: show version
when:
- (inventory_hostname in groups['MLX'])
changed_when: false
ignore_errors: true
no_log: true
register: output_mlx
- name: Results [MLX]
ansible.builtin.debug:
msg: "{{ output_mlx.stdout_lines[0] }}"
when: output_mlx.stdout_lines[0] is defined
PS: Using FQCN is the way to go since ansible-core 2.9 so I’d suggest you using / installing ansible-linter
so you will be notified of those ‘good practices’ tips during implementation on vscode
PS2: I was editing this post aaaaand deleted this by mistake → You could also check if you’re running the latest version of the collection:
ansible-galaxy collection install community.network --force
Cheers!
I just noticed this on the “community.network” collection repo;
I believe I’ve seen you’re using ansible-core == 2.15
, right? This one is not supported / not been tested for the collection. Can you try it on 2.13, please?
EDIT: Updated info
thanks, can you please share with me how i can change this setting “ansible-core == 2.15”?
hey @mapleos1123
In AWX 23.0.0 You can choose different EE’s on Administration > Execution Environments
, so you can easily switch among different ansible-core
versions, galaxy collections and even python packages. If you don’t have one that provides ansible-core
< 2.15, then you can follow the instruction @TheRealHaoLiu gave you several posts ago to customize yours:
You may also find this thread useful:
Hey,
ssh -vvv -F ./ssh.cfg -o StrictHostKeyChecking=no -o 'User="svc_opstools"' -o ConnectTimeout=10 -o 'ProxyCommand=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -q noc@jump.test.net' -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no test-host.test.net
Are you using the same ssh options when running your playbook form AWX ? I’m thinking they differ in some way, so better compare them.
Socket error: Connection reset by peer
Usually means the remote server drops your connection. In this case, it might be the bastion or the remote node. Could you check both machines sshd logs ?
I’m also wondering about timeout values; I commented your config a few days ago; have you looked into it ?
ansible_persistent_command_timeout: 300 # This key doesn’t exists, you either use envvar ANSIBLE_PERSISTENT_COMMAND_TIMEOUT or command_timeout key (under [persistent_connection] section from ansible.cfg); see: Ansible Configuration Settings — Ansible Documentation
Thanks a lot for your time. I am getting more and more confused now, it seems indeed my playbook is using different SSH options than the SSH command I directly used in K8S containers
From the SSH logs of my jump host, the reason of the SSH issue in this playbook seems that the playbook is trying to use ssh account/password login, instead of the SSH public key file login
But I don’t know how I can change it.
I have in my ansible.cfg
[ssh_connection]
# ssh arguments to use
# Leaving off ControlPersist will result in poor performance, so use
# paramiko on older platforms rather than removing it, -C controls compression use
#ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s
#ssh_args = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
ansible_connection = ssh
ansible_ssh_common_args = '-o ProxyCommand="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p -q noc@test.test.net" -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'
# ssh_args = -F ./ssh.cfg -o ControlMaster=auto -o ControlPersist=30m
ssh_args = -F ./ssh.cfg
I have in my ssh.cfg
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
Host *
ProxyCommand ssh -W %h:%p noc@test.test.net
User noc
# point to the local authorized key
IdentityFile ~/projects/.ssh/id_ed25519
Host test.test.net
Hostname test.test.net
User noc
# point to the local authorized key
IdentityFile ~/projects/.ssh/id_ed25519
# ControlMaster auto
# ControlPath ~/.ssh/%r@%h:%p
# ControlPersist 5m
And in my playbook associated credential, I am using ‘Credential type’ as ‘Machine’ and I filled in the user name as ‘svc_opstools’ and its password.
So I think Ansible should use the SSH public key login for its SSH session towards my jump host (test.test.net in the above context), then use the SSH account/password login (with username svc_opstools) to further log into my SLX switch in this playbook
Any suggestion how I shall change it ?
Thanks
I finally made it work, but I am not sure how it exactly worked, especially why it worked on my AWX 9 before but with the same configuration it did not work on my AWX 23.0
What I changed is just adding my SSH private key into the Credential of the related playbook
Previously this credential is configured as ‘Credential type’ as ‘Machine’ and I filled in the user name as ‘svc_opstools’ and its password. Now I just add my SSH private key into ‘SSH Private Key’ section, see my attachment
So it seems my .ssh configuration file is not even used at all ?
Hey,
Glad to see you fixed your issue .
First off, I don’t know a thing about AWX credentials management as I don’t use it. But I remember from your first post you were using paramiko instead of openssh, and paramiko doesn’t use openssh client config.
Now I’m not sure if you finally installed missing packages to use openssh instead, but I remember there were some misconfiguration on your Ansible config in general. I’m not motivated enough to go through all of this thread’s history right now, but there should be all the info you need to understand what goes wrong. If that doesn’t help, allow me a few days or weeks to get back to it as I’m pretty busy these days .