ansible ec2_facts returns false data (if there is NAT on the system level; This is ok if You use AWS router interface gateway)

Jakub_Muszynski · July 14, 2015, 12:16pm

THE PROBLEM:
I’ve just realised why sometimes my playbook fills the template with false data

This happens, when the instance is in my VPC subnet (with internet gateway), while in configuration there is NAT route table on the system level, then reguest to the internet goes through NAT instance and the AWS response is covered.
Then the NAT_instance facts are returned, NOT the current_instance facts about.

THE DEBUGGING:

If You look into the code, the ec2_facts fetch a bunch of requests to

‘http://169.254.169.254/latest/meta-data’

in Example:

curl http://169.254.169.254/latest/meta-data/local-ipv4
172.16.0.200

while real data is

eth0: ***

inet 172.16.0.110/24 brd 172.16.0.255 scope global eth0

THE INSTANCE CONFIGURATION:

$ ip r

default via 172.16.0.200 dev eth0

172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.110

172.16.0.0/16 via 172.16.0.1 dev eth0

$ ip a

eth0: ***

inet 172.16.0.110/24 brd 172.16.0.255 scope global eth0

If You keep remote files, You can check it Yourself

export ANSIBLE_KEEP_REMOTE_FILES=1

and then

python /home/ubuntu/.ansible/tmp/ansible-tmp-1436872330.49-72199016469620/ec2_facts

will return as one of the facts:

“ansible_ec2_local_ipv4”: “172.16.0.200”,

(or run a curl)

curl http://169.254.169.254/latest/meta-data/local-ipv4

THE CURRENT WORKAROUND:

do NOT use (in roles nor tasks)
- action: ec2_facts
DRAWBACKS:
You will not have some variables available (ansible_ec2_* will be unavailable)
You will have only ec2_* facts from you LOCAL inventory cache (ec2.py if I’m correct now)
If You add in playbook (“gather_facts: True”) then You can also use ansible_* facts gathered by setup.py module
so instead of ansible_ec2_local_ipv4 You can use **ansible_eth0['ipv4][‘address’]**1. BUT this can bring some problems when You have a role, that expects some vatiable (example: ansible_hostname), but in the playbook You have disabled system fact gathering (“gather_facts: False”) - You will have to be carefull
OR You would like to access some AWS variable, independent form Your LOCAL cache1. configure you VPC routing tables so it will point to NAT-instance-interface, rather than IP address
0.0.0.0/0 eni-xxx / i-xxx1. instead of:
0.0.0.0/0 igw-zzzzz + system routing tables1. Then You do not have to override the routing table on the system level
You rely on AWS Router
DRAWBACKS
You will have to change the routing table in the VPC, pointing to other phisical interface, when Your NAT instance will shut down
vs1. If kept with system routing table, You will lunch new NAT-instance with “old IP address” attached
QUESTIONS / CONCLUSION:
Be aware about ec2_facts limitation
If possible - rely on Amazon Routing Table
How You prevent SPOF in Your VPC subnets?
What is Your best-practise to configure VPC subnet (private and public), so they have internet outside access (for github, apt), and are still safe without SPOF that is NAT-instance?

igorc · July 15, 2015, 12:21am

I’m using Ansible with AWS VPC’s, where most of them have public and private subnets, and have never had the problem you are seeing. This is definitely a misconfiguration on your side and nothing to do with Ansible. The ec2_facts is doing the right thing, there is no other way of collecting data except querying the meta-data repository which is what the AWS CLI tools do anyway. Meaning you will get wrong data using AWS CLI as well. Don’t forget you are in the cloud and your networking is configured in the hypervisor/SDN level and NOT on instance level. Meaning you can create as many network interfaces as you want on instance level and set IP’s on those but none of them will work since you have bypassed the SDN and there is no record of those in the meta-data repository. Which finally means that collecting facts on the instance locally really means nothing if those values don’t match what is in the meta-data repository.

Now that we have that cleared, lets move to your problem, which looks to me is AWS routing tables. Or more specific the lack of those. For an instance to be in a private subnet it needs separate routing table from the VPC’s default one (which has IGW created for you when the VPC was created) that has the NAT instance as IGW (internet gateway). And that is all you need, you don’t have to set any routing tables on the system level, the SDN will route the traffic for you.

Hope this makes sense. Since you haven’t provided any info about your subnets, routing tables, ACL’s etc. this is more of a guess what’s going on so please correct my assumptions if needed.

Thanks,
Igor

igorc · July 15, 2015, 12:52am

Have to correct myself, you do provide the subnet information. So in answer to you questions/conclusions they way I do it is:

Use private routing table for the private subnets pointing to the NAT as IGW
Use 2 x NAT instances and NAT takeover script that modifies the the private subnets routing table and points the IGW to itself in case the other NAT instance has failed

Jakub_Muszynski · July 15, 2015, 11:14am

Thanks Igor.

You are right, it is not ansible “bug”, but an configuration-feature, tough it is the “bad one” since it silently provides the false data. I had to dig into the source code to track it down.
There could be some warning in ec2_facts detecting default route, but it would be some work

Topic		Replies	Views
Very simple, but very usefull path to ec2_remote_facts Ansible Project	0	2	September 15, 2016
Retrieve ec2 facts Ansible Project	1	0	December 4, 2015
Bringing cohesion to ec2_facts and EC2 inventory plugin Ansible Project	6	1	September 14, 2014
ec2 facts from templates not rendering Ansible Project	8	1	May 22, 2013
Aws specific facts Ansible Project aws	5	0	January 19, 2013

ansible ec2_facts returns false data (if there is NAT on the system level; This is ok if You use AWS router interface gateway)

Related topics