troubleshooting ansible win problem

Hi,
(sorry, this turned out rather long)

I had windows updates running using an ansible playbook.

I was using domain account, become, runas etc all working pretty well for the last 9 months…

Then I upgraded ansible running on my centos 7.5 system to version 2.8 as I was working on the same system but was going to write a playbook to add a user account to aws instances…

someone that used this same system and a playbook I wrote to run windows updates on their windows VMs then told me his playbook stopped working

I looked at it and it seemed there might have been kerberos problems, or domain membership problems - I wasn’t getting anywhere so…

I build a new ansible control server, joined to domain, setup pywinrm[kerberos] and more or less have a pristine system now with ansible 2.8, but I have the same problem, so it seems as though it is the uprgrade to 2.8 maybe that has caused this.

basically, I run the playbook, it connects and gets to:

TASK [Install Updates] ****************************************************************************************************************
task path: /home/bi003do/Playbooks/WinUpdate/win-update-prod.yml:12
win_updates: running win_updates module
Using module file /usr/lib/python2.7/site-packages/ansible/modules/windows/win_updates.ps1
Pipelining is enabled.
ESTABLISH WINRM CONNECTION FOR USER: mydomainuser@MYDOMAIN.LOCAL on PORT 5986 TO covmgrid83
EXEC (via pipeline wrapper)

and thats as far as it gets… on the VM itself I can see processes running under the mydomainuser ID in task manager, so I know it is connecting correctly, I have googled on this and found some info, but nothing that has helped yet… Unfortunately, the user that runs my playbook to update templates has a limited window each month to get this done before they start a long process that last several weeks and we can’t run updates and if we don’t the security team gets hot and bothered.
The window is quickly closing for this month and I am at a loss. How can I troubleshoot this futher.

Also I was able to do a win_ping on this server, but only using a local domain account, when I tried it with a domain account, it actually crashed - it goes by the sticking line above and I get this

EXEC (via pipeline wrapper)
The full traceback is:
Exception of type ‘System.OutOfMemoryException’ was thrown.
At line:13 char:1

  • $module = [Ansible.Basic.AnsibleModule]::Create($args, $spec)
  • CategoryInfo : OperationStopped: (:slight_smile: , OutOfMemoryException
  • FullyQualifiedErrorId : System.OutOfMemoryException

ScriptStackTrace:
at , : line 13

System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.
at System.Runtime.CompilerServices.RuntimeHelpers._CompileMethod(IRuntimeMethodInfo method)
at System.Reflection.Emit.DynamicMethod.CreateDelegate(Type delegateType, Object target)
at System.Linq.Expressions.Compiler.LambdaCompiler.Compile(LambdaExpression lambda, DebugInfoGenerator debugInfoGenerator)
at System.Linq.Expressions.Expression1.Compile() at System.Runtime.CompilerServices.CallSiteBinder.BindCore[T](CallSite1 site, Object args)
at System.Dynamic.UpdateDelegates.UpdateAndExecute3[T0,T1,T2,TRet](CallSite site, T0 arg0, T1 arg1, T2 arg2)
at System.Management.Automation.Interpreter.DynamicInstruction`4.Run(InterpretedFrame frame)
at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
covmgrid83 | FAILED! => {
“changed”: false,
“msg”: “Unhandled exception while executing module: Exception of type ‘System.OutOfMemoryException’ was thrown.”

actually, this is now saying out of memory, which was different than before… is that really a memory errror?

Well, any advice on what I should be checking next would be appreciated.

Thanks
Bill

I increased memory to the VM and now getting the original error on win_ping using domain acccount (sorry about the first post, just found the code button)

`

ansible -i winhosts win-update-prod -m win_ping -vvv
ansible 2.8.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u’/home/bi003do/.ansible/plugins/modules’, u’/usr/share/ansible/plugins/modules’]
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Apr 9 2019, 14:30:50) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
Using /etc/ansible/ansible.cfg as config file
host_list declined parsing /home/domainuser/Playbooks/WinUpdate/winhosts as it did not pass it’s verify_file() method
script declined parsing /home/domainuser/Playbooks/WinUpdate/winhosts as it did not pass it’s verify_file() method
auto declined parsing /home/domainuser/Playbooks/WinUpdate/winhosts as it did not pass it’s verify_file() method
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by
default, this will change, but still be user configurable on deprecation. This feature will be removed in version 2.10.
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

Parsed /home/domainuser/Playbooks/WinUpdate/winhosts inventory source with ini plugin
META: ran handlers
Using module file /usr/lib/python2.7/site-packages/ansible/modules/windows/win_ping.ps1
Pipelining is enabled.
ESTABLISH WINRM CONNECTION FOR USER: domainuser@MYDOMAIN.LOCAL on PORT 5986 TO covmgrid83
EXEC (via pipeline wrapper)
The full traceback is:
Property ‘ErrorRecord’ cannot be found on this object. Make sure that it exists.
At line:61 char:5

  • Write-AnsibleError -Message “Unhandled exception while executing module” `
  • CategoryInfo : NotSpecified: (:slight_smile: , PropertyNotFoundException
  • FullyQualifiedErrorId : PropertyNotFoundStrict

ScriptStackTrace:
at , : line 61
at , : line 26
at , : line 137
at , : line 7

System.Management.Automation.PropertyNotFoundException: Property ‘ErrorRecord’ cannot be found on this object. Make sure that it exists.
at System.Management.Automation.ExceptionHandlingOps.CheckActionPreference(FunctionContext funcContext, Exception exc eption)
at System.Management.Automation.Interpreter.ActionCallInstruction`2.Run(InterpretedFrame frame)
at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
covmgrid83 | FAILED! => {
“changed”: false,
“msg”: “Failed to invoke PowerShell module: Property ‘ErrorRecord’ cannot be found on this object. Make sure that it exists.”
}

`

I don’t know what this, but I have some suggestions of things you can try.

Check the environment variables on the target host are ok. I’m wondering if maybe it can’t find a powershell module.

If this is a server 2008R2 box, make sure its had the memory fix update applied - see https://docs.ansible.com/ansible/latest/user_guide/windows_setup.html#winrm-memory-hotfix (or better still upgrade it to a later powershell /WMF version), as that’s known to cause out of memory when you attempt to use winrm.

Double check the winrm configuration and firewall on the remote host (although I doubt you’d be able to connect in that case).

Make sure the destination machines can actually access the network so that they can receive windows updates. I think you’d probably see nasty COM errors from the guts of windows if that was the case though.

Worth checking the event log after you have run the playbook.

Check that your domain user still has the privileges it needs. IIRC the user needs a fair few privileges in order to run windows updates.

Check your domain user is still a local administrator on the box.

Also, on another tack you can try removing ansible from the equation and try the example python pywinrm script here: https://github.com/diyan/pywinrm/#run-a-process-on-a-remote-host to see if you can run commands.

Sorry I don’t have the solution but I hope something from the above list might help you fix this.

Jon

Hi Jon, thanks for the input… this was all working… I believe all I did was upgrade from ansible 2.6 to 2.8, using the same windows hosts to run against, same user - it had been working well for 8 months or so and they ran the updates on about 60 VMs every month. Then I was working on another project for configuring users on aws instances and decided to upgrade to 2.8 - well thats what yum did anyway when I did yum update ansible. Is there some way I can back level to 2.6 with yum? it doesn’t seem like it as yum list only lists ansible 2.8 - it would be nice to confirm that this is what broke it anyway.

I have upgraded powershell before on all the vms. It does connect, I can see in task mgr that there are 5 tasks, one powershell running as the user that I am using in my ansible play.

I continue to work on it anyway, thanks

Bill

I tend to install ansible from pip as pip makes it easy to revert to old versions.
I think you can install ansible via pip just for your user as well, so that’s worth exploring.

I get that it used to work, but windows updates and unknown actions taken while you were working on other projects could have affected the windows hosts.

Bear in mind too that when you run windows updates you are at the mercy of whether or not MS’s windows updates servers are busy or not, so it might just be worth killing and retrying it a few times.

One other thought. Do you have any custom modules that you are using? I’ve burned myself in the past by using latest development versions of modules with a release ansible version, then upgrading months later upgrading and forgetting that I was using a custom module and wondering why whizzy new functionality in latest released module wasn’t working.

That module has definitely had some updates since 2.6 so worth double checking you aren’t picking up an old version.

Hope this helps,

Jon

Hmm, at a glance it looks like maybe there’s a problem in the error handler where it’s expecting an ErrorRecord to always be present, but in this case it’s not. If that’s what’s happening, it’s masking whatever the real error is because it’s blowing up in the error handler. Can you file that output in a Github issue against https://github.com/ansible/ansible?

Thanks,

-Matt

Thanks again Jon, yes started looking at pip to install and hopefully can revert to 2.6 today to test if this is the problem definitively.

I use my own wsus server and its pretty beefy (just increased the pool memory in it as it did crash sometimes and that all seems good now) and only about 100 systems on the network that use this wsus, but I will double check that its working…

I don’t use any custom modules… I’m pretty basic plain jane at this point as this was my first project in ansible.

I’m finally back from my travels and dinky little laptop screen so I can start to work on this in earnest on my workstation now and hopefully turn up the problem.

I appreciate your thoughts and advice, helps keep me thinking in more directions with another set of eyes! :blush:

Bill

I back leveled to 2.6, things started working then, but ran into a known bug, advanced to 2.6.2 and it appears stable now.

I need to get some updates on windows systems done pretty urgently, as soon as those long running processes finish I will attempt to go back to 2.8 and see if the problems return and if so file an issue per Matt’s suggestion in github.

Thanks again for the suggesting using pip for installing, this makes it much easier to pinpoint problems for me

Bill

Hello William,

Would you be able to share the playbook here?

Thanks,
Nandhakumar