Hi Lee, sorry for taking so long to respond.
To be honest our setup has grown quite organically in some cases, but we’re trying to get some guard-rails around things (e.g. consistent groups defined in common tools we use, access to credentials control, inventory maintenance, etc).
We currently have no config mgmt for our server environment and for the project
I’m on (+ any other project) we will use Ansible to manage the configuration changes.
IMHO, that’s a good start and mirrors how we got started. Our OS deployment was in bad shape and the legacy scripts that worked up to RHEL-6 were failing in RHEL-7. Plus our environment had grown so complex and diverse that the build scripts we used to automate configurations (DNS, NTP, LDAP, SUDO, Satellite/patching, etc) were not keeping up and their original intent was for a much more manual build process.
When we started the push to move to RHEL-7 so we could get off of vulnerable and un-supported RHEL-5 and older, we decided to make that the time to use Ansible. We didn’t have Tower (and AWX wasn’t open-sourced yet) so our VMWare build process called out to a script on a central server that kicked off the Ansible playbook. Our RHEL-7 templates are built by Packer (https://www.packer.io/) and have a build SSH key included. This SSH key is used by SSH on the central server to connect and execute the Ansible playbooks.
The playbook is made up of a number of Ansible roles that each perform a specific task:
The nice thing is that each role should actually be owned by the team responsible for that service. For instance, we use Zabbix for monitoring and the Zabbix team has kept up the setup_monitoring role files themselves building the configuration files, pointing to the install files, etc.
Breaking the project into roles also helped a lot to work around pain points. For instance, in the early version the roles for DNS, then hostname then Satellite were called but we got some intermittent failures. After working through the logs, we saw that the time on the new VM was incorrect and the SSL certificate that Satellite was providing wasn’t valid with the local clock. Rearranging the roles and getting NTP setup earlier resolved that quickly. It did mean that we had to include NTP in the base system, but that was a minor increase in size.
Converting all this to a playbook that is reusable by Tower/AWX is my next step, too. With the exception of the monitoring team, all other changes that come in are either an email or other non-git controlled change that I have to incorporate and help them test. This has required me to start learning the basics of Git branching so I can keep the master branch stable while we/I are working on fixes and features.
Inventories will be a challenge, too. On our older CLI environment, the individual users kept their own inventory files and would create the text inventory files - the most common was to name these files with the number of the trouble-ticket they were working on (to build new servers, fix issues, etc). Given the number of people that had access, it was safer to let each person keep their inventory files in their home directories. (Putting them all in one location would lead to someone accidentally running a playbook on “inventory.TKT98765” when it should have been “inventory.TKT98775”.)
In the Tower environment, the bigger project has been given the direction by our corporate designers/architects to have our development departments broken up into six organizations that we’ve mapped into Git and Tower (using LDAP). Management of the users is easy, but I’m a bit concerned that the size of those organizations might lead to teams inadvertently choosing the wrong inventory and making changes to systems they shouldn’t have. Part of it will be a learning and education experience, but I’d rather get the guard rails setup to reduce that chance. Right now most teams are building their own inventory files and that will probably continue to work. Within their organizations they have enough communication that the naming collision shouldn’t cause a problem. If you come up with a slick way to add inventory to a wider group but still maintain access controls on the servers I’m all ears.
At a future date I’d love to be able to have Tower query our CMDB or VMware infrastructure and get the inventory back dynamically. That’s a long way off, but it’s a goal I’m aiming at. This will permit me (and more importantly my users and their managers) the ability to adjust the inventory based on business need. (“Select all servers which handle CustomerX and ensure this patch is applied.” Or “Configure the ssh service on all systems in the DMZ to only permit logins from the internal management subnet.”)