Exception: host not found issue with v1.8.2

Hi,

Having problems with 1.8.2

I believe the problem is similar to

https://github.com/ansible/ansible/issues/9828

http://serverfault.com/questions/652746/ansible-exception-host-not-found-item

  • name: Create ansible environmental debug
    action: template src=dumpall.j2 dest={{ debug_dir }}/ansible-{{ playbook }}.{{ ansible_date_time.date}}.txt

TASK: [openbet/env-debug | Create ansible environmental debug] ****************
fatal: [ec2-test-boapp02] => {‘msg’: ‘Exception: host not found: admin_login_key’, ‘failed’: True}

The posts recommend patching l**ib/ansible/runner/**init.py. I don’t have access control on the box to patch the version and can only take from EPEL

What’s the best way to fix this?

1.8.2 is the only version that’s available. Is there a workaround I can make or is there a release date for 1.8.3?

EPEL was on 1.8.1 which broke our build with variable precedence. Now this. I’m a tad stuck at the moment

I would really appreciate any help

Many thanks

James

Are you able to test whether that fixes your issue? Are you able to pull from epel-testing into your production environment? I could talk to the epel maintainer about pulling that patch into the epel package which may help you out. But there have been some other issues opened with similar symptoms that haven’t been fixed by that patch so I’d like to confirm that this would fix your issue first.

-Toshio

Hi,

Thanks for the quick response. I will have a chat with our OPs guys first thing tomorrow and get them to patch lib/ansible/runner/init.py and see if it fixed it.

Will let you know soon as I find out.

Thanks

Hi,

I have applied the patch and its producing the same error. I will create a example to demonstrate the problem

It isn’t the same as the delegate_to issue though does seem to relate to template handling with groups / group_names in templates

James

The issue actually seems to be with the hostvars.

Ive knocked up a cut down version that emulates this

https://github.com/jamesdmorgan/ansible1.8.1_template_issue

PLAY [localhost] **************************************************************

GATHERING FACTS ***************************************************************
ok: [localhost]

TASK: [env-debug | Create debug directory] ************************************
ok: [localhost] => (item=./debug_log)

TASK: [env-debug | Create ansible environmental debug] ************************
fatal: [localhost] => {‘msg’: ‘Exception: host not found: ansible_connection’, ‘failed’: True}
fatal: [localhost] => {‘msg’: ‘Exception: host not found: ansible_connection’, ‘failed’: True}

FATAL: all hosts have already failed – aborting

PLAY RECAP ********************************************************************
to retry, use: --limit @/home/jmorgan/site.retry

localhost : ok=2 changed=0 unreachable=1 failed=0

Issue is in 1.8.2, I misnamed the repo. Apologies

I'm not getting the host_not_found error in either 1.8.2 or devel but
I suspect that I need something specific in my inventory file to
reproduce this? Are you able to post what that should be?

-Toshio

Hi,

If you look at the git example I created

https://github.com/jamesdmorgan/ansible1.8.1_template_issue

.
├── debug_log
├── README.md
├── roles
│ └── env-debug
│ ├── defaults
│ │ └── main.yml
│ ├── tasks
│ │ └── main.yml
│ └── templates
│ └── dumpall.j2
└── site.yml

I just ran

ansible-playbook --connection=local site.yml

The issue seems to be around the template which contains

{{ hostvars | to_nice_json }}

If I remove this it works fine.

Hope that helps

James

Ok im confused now.

I’ve just spun up a docker container and isolated it and it doesn’t error on v1.8.2

I will have to do more debugging to see what is in the environment / inventory on our dev servers that could cause this.

James

I have done some digging and there seems to be a problem with to_nice_json.

if I use

-{{ hostvars | to_nice_json }}

it breaks

if I use

-{{ hostvars | to_json }}

It works correctly

def to_json(a, *args, **kw):
    ''' Convert the value to JSON '''
    return json.dumps(a, *args, **kw)

def to_nice_json(a, *args, **kw):
    '''Make verbose, human readable JSON'''
    return json.dumps(a, indent=4, sort_keys=True, *args, **kw)

It does look like it uses a different encoder under the hood if indent & sorting is enabled

https://sourcegraph.com/hg.python.org/cpython@default/.PipPackage/Python/.def/json/dumps/cls?_codeview=1

cached encoder

if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
default is None and not sort_keys and not kw):
return _default_encoder.encode(obj)
if cls is None:
cls = JSONEncoder
return cls(
skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators, default=default, sort_keys=sort_keys,
**kw).encode(obj)

Dev environments use Python 2.6.6
Docket has Python 2.7.3

That could well be the root of my issue and why you are unable to reproduce and why I can’t on docker

Had a chance to try using your repo on CentOS6 (with python-2.6.6) and
I can confirm that ansible-1.8.2 has this problem but ansible devel
does not. Going to try to track down the commit that causes this and
whether it's easy to isolate the patch to fix it.

-Toshio

Mmmm.... one big correction -- It does not work with current devel (I
was testing with a *really* old checkout before). So there's another,
unaddressed bug here...

-Toshio

Commit where this problem starts:
29d41bb789383c3ff59269b28877ea0f270f5861 however, I don't think this
is a bad commit. I think it's just exposing a bug elsewhere.

Where that elsewhere is... it looks to me like the json library in
python2.6's stdlib has a bug that's causing this to fail. I haven't
found a bug report in python or simplejson's issue trackers that
points out where they may have fixed this specifically. I have found a
commit where upstream python synced changes from simplejson-2.0.9
(python2.6's json looks to have been based on simplejson 1.9).
Delving into simplejson's code, I found that this commit fixed this
issue: https://github.com/simplejson/simplejson/commit/2afcca8635bb0adce45bcff7c069f320c82eb9f8#diff-436b24490ceb0883d559805a466865c5

That appears to have been part of an optimization effort rather than a bugfix...

Where does that leave us.... To work around your bug on EPEL6, I
think I can get the EPEL package to include a small patch to prefer
simplejson over json in filter_plugin/core.py. Not sure if I can
also have ansible in epel6 start Requiring python-simplejson but even
if I can't you can simply install it on your RHEL6 hosts. Between
those two things, that should fix this particular issue for you.

For upstream ansible code, the answer isn't quite so simple. I
haven't tracked down precisely what it is about your test that's
breaking (other than it's to_nice_json) and we probably don't want to
force everyone to install simplejson. So I'm not sure what precisely
we want to do. Perhaps, something like this in to_nice_json():

if sys.version_info < (2, 7):
    try:
        import simplejson as json
    except ImportError:
        return to_json()
        # Or raise an informative error?

I'll talk to a few other people about what they'd like the fallback
code to do in this case.

-Toshio

Okay, update of ansible for epel6 has been submitted:
https://admin.fedoraproject.org/updates/ansible-1.8.2-3.el6 If you're
interested in testing it you should be able to download directly from
Fedora's buildsystem (there's a link in the update). The Fedora/EPEL
workflow pushes builds to the updates-testing repository first. And
after a few weeks into the main updates repo.

Hope that helps!
-Toshio

Hi,

Thanks so much for looking into this. Much appreciated.

I’ll get 1.8.2-3.el6 tested.

Cheers!

James