ansible-openstack: vm's unable to see internet

Howdy.

My two nodes - Controller 10.0.9.170 and Compute - can see the internet. My vms can see each other and the Controller. However, i can’t ping 8.8.8.8 from my vms.

root@2.2.2.4 # ping 2.2.2.2, ping 1.1.1.3, ping 1.1.1.7 all work.

[root@2-2-2-4 ~]# tracepath 8.8.8.8
1: 2.2.2.3 (2.2.2.3) 22.476ms
2: no reply
3: no reply

[root@Linux-OpenStack-Admin ~]# nova list --all-tenants

Supporting OpenStack is a bit beyond the scope of the Ansible mailing list as there are a lot of variables in setting up OpenStack to get it to do what you want.

Was this something with the Ansible OpenStack playbooks in our github repo? This is more of a starter setup and we expect most people to heavily modify it.

did you configure your upstream routers with a static route so packets for 1.1.1.0/24 are sent via Linux-OpenStack-Admin?

michael: yes, this is regarding the ansible-redhad-openstack playbook

darragh: I’ve tried a few things. I was hoping to be able to configure my Linux-OpenStack-Admin node with a POSTROUTING route instead of changing the upstream router which requires buggin the tech team.

We did try setting the upstream router to forward from the equivalent to 1.1.1.0/24 in our setting.

iptables -t nat -A POSTROUTING --source 10.0.11.74/28 --output-interface eth0 -j MASQUERADE

corporate gateway is 10.0.10.1

ip route add 10.0.11.74/28 via 10.0.10.1 dev eth0

We also tried having the quantum gateway forward to the eth0 on Linux-OpenStack-Admin ala

iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i gq-d39ac7f4-3f 10.0.11.66 -o eth0 -j ACCEPT

After a fresh install, the compute node can ping 8.8.8.8, but the admin node can’t.
The problem is the default gateway assigned to the quantum gateway gq-*** below

[root@patch-test-admin ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
1.1.1.0 * 255.255.255.0 U 0 0 0 qg-7586114d-c1
4.4.4.0 * 255.255.255.0 U 0 0 0 qr-4869ec23-73
10.0.8.0 * 255.255.252.0 U 0 0 0 eth0
link-local * 255.255.0.0 U 1002 0 0 eth1
link-local * 255.255.0.0 U 1003 0 0 eth0
default 1.1.1.1 0.0.0.0 UG 0 0 0 qg-7586114d-c1
default vpn.21technolog 0.0.0.0 UG 0 0 0 eth0

This route get looks correct, but ping out doesn’t work:

root@patch-test-admin:~$ ip route get 8.8.8.8
8.8.8.8 via 10.0.10.1 dev eth0 src 10.0.9.176
cache mtu 1500 advmss 1460 hoplimit 64

If I delete the default 1.1.1.1 route, and then do a service network restart I can ping out.

root@patch-test-admin:~$ ip route del default via 1.1.1.1 dev qg-7586114d-c1
203 root@patch-test-admin:~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.11.64 * 255.255.255.240 U 0 0 0 qg-d39ac7f4-3f
10.0.8.0 * 255.255.252.0 U 0 0 0 eth0
link-local * 255.255.0.0 U 1002 0 0 eth1
link-local * 255.255.0.0 U 1003 0 0 eth0
default vpn.21technolog 0.0.0.0 UG 0 0 0 eth0

204 root@patch-test-admin:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
— 8.8.8.8 ping statistics —
7 packets transmitted, 0 received, 100% packet loss, time 6680ms

flushing the cash, also not sufficient.

ip route flush cache
still no ping
root@patch-test-admin:~$ service network restart
214 root@patch-test-admin:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=51 time=13.6 ms

However, I think having the quantum gateway is essential to the vm instances on the compute node to see out.

Just to be sure I’ve got it straight, here are the group_vars/all again with a few modifications:

This interface device should not have an ip assigned

quantum_external_interfaces: eth1

This interface device should have an ip assigned

iface: eth0

external_subnet_cidr: 10.0.11.74/28 # our vpn hands out ips on 10.0.8.0/255.255.255.252, chose it to be a subset this time

any more troubleshooting tips would be appreciated.

kesten

Hi Kesten,

If i am getting this clear you have specified the external subnet for vm’s floating ip as “10.0.11.74/28” which give you 16 floating ip’s 11.65 - 11.78 , and as per your mail your vpn hands out 10.0.8.0/30 which just gives you 2 ips.

Could you please check those subnet again and make sure the floating ip’s given are a subset of the vpn handled machines.

I guess the issue is a reverse route on your upstream router, which should point to the quantum gateway. i would suggest to work wiht your network admins and get a subnet for your vm’s and make sure in the upstream router there is a reverse route pointing to the quantum gateway ip, 1.1.1.1.

Regards,
Benno

sorry, i mis-typed

external_subnet_cidr = 10.0.11.74/28
company vpn external ips = 10.0.8.0/255.255.252.0 - so 10.0.8.0/22 if i have my cidr right, our range of external ips on the corp vpn is 10.0.8.0 to 10.0.11.255.

so yes, the external_subnet_cidr is a subset of the company vpns on this spin-up of our experimental cloud. First try, i had it as 1.1.1.0/24 as per the group_vars/all .

“I guess the issue is a reverse route on your upstream router, which should point to the quantum gateway. i would suggest to work with your network admins and get a subnet for your vm’s and make sure in the upstream router there is a reverse route pointing to the quantum gateway ip, 1.1.1.1.”

The original group_vars/all had external_subnet_cidr=1.1.1.0/24 and the ansible script assigns 1.1.1.1 to qg-**** device on the controller node. In our case, the quantum gw ip is 10.0.11.66 (first available ip on subnet).
Controller and Compute nodes are at 10.0.9.170, .172 respectively so they are not in the quantum subnet.
My sysadmin thought that since the quantum gateway ip 10.0.11.66 is in the corp vpn net we should be able to ping it from our laptops, without setting up a special reverse route on the gateway.
Indeed, i can ping 10.0.11.66 from my laptop.

Does the ansible script support choosing the subnet both within the corporate vlan (for us 10.0.8.0/22 ). If so, what config changes / iptable differences are required between scenarios?

A question about Promiscuous mode. This link refers to a flat-dhcp topology so it may not apply for our vlan topology?
"eth2 is configured to use promiscuous mode! This is extremely important. It is configured in the same way on the compute nodes. Promiscuous mode allows the interface to receive packets not targeted to this interface’s MAC address. Packets for VMs will be traveling through eth2, but their target MAC will be that of the VMs, not of eth2, so to let them in, we must use promiscuous mode.

Connecting to the vms on compute node from the ansible controller (my mac, outside the cloud)"
For us, the nic referred to as eth2 -the vm host- is eth1, which is not in promiscuous mode. Trying it i was still unable to ping out from Linux-OpenStack-Admin.

I’m currently scratching my head over a disagreement between ping and tcpdump from this experiment:

[root@Linux-OpenStack-Admin ~]# tcpdump | grep ICMP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
*** after the ping below, i get a stream of these ***
12:28:32.064219 IP Linux-OpenStack-Admin > google-public-dns-a.google.com: ICMP echo request, id 11389, seq 1, length 64
12:28:32.078474 IP google-public-dns-a.google.com > Linux-OpenStack-Admin: ICMP echo reply, id 11389, seq 1, length 64

OPEN ANOTHER TERM
[root@Linux-OpenStack-Admin ~]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
— 8.8.8.8 ping statistics —
13 packets transmitted, 0 received, 100% packet loss, time 12023ms

It appears the packets are being sent and arrive back at Linux-OpenStack-Admin, but are rejected?

Hi Kesten,

One of these days can you hop onto the IRC irc.freenode.net, channel ansible my handle is benno, i will try to help out, that might be much faster to get a resolution.

Regards,
Benno

thanks benno. I’m on freenode now. kbroughton. ping me when you get a chance please.

kesten

I think i might have come across something.

http://www.mirantis.com/blog/openstack-networking-vlanmanager/

Next, we need to tell the switch to pass tagged traffic over its ports. This is done by putting a given switch port into “trunk” mode (as opposed to “access” mode, which is the default). In simple words, trunk allows a switch to pass VLAN-tagged frames; more information on vlan trunks can be found in this article. At this time, configuring the switch is the duty of the system administrator. Openstack will not do this automatically. Not all switches support vlan trunking. It’s something you need to look out for prior to procuring the switch you’ll use.

My sysadmin is looking into it now, but likely, the switch is ACCEPT, not TRUNK atm.

k

Hi Kesten,

If you have used the anible playbooks to deploy the openstack, you wont need this trunking in the switch ports, as we are using GRE tunnels to pass l2 traffic over l3 network(data network).

Regards,
Benno

hi benno,

i’ll be on irc from 6pm cst on tonight, and tomorrow starting at 9am. hope to catch you.
kesten