Semi-Complicated Ansible OSS Project

I want to make it as easy to set up a full realistically-sized and realistically-configured distributed database cluster (MySQL, HBase, Cassandra, et al) as it is to set up a single database today. That goal is probably a year or more away, but the initial steps have been taken.

Now you can set up a fully-functional MySQL Master/Slaves with MHA high-availability failover using Ansible on Ubuntu 12.04. The Ansible Playbooks (and supporting scripts) are on github, here: “git@github.com:time-palominodb/PalominoClusterTool.git” (https://github.com/time-palominodb/PalominoClusterTool).

The project is in its infancy, and it’s probable there will be some mistakes in the meta of how the project is done. If anyone feels like giving me advice off-list (or on-list if you think Ansible community could benefit), such advice will be well-received, since the way things are done now will be mirrored by future developers, and bad habits are best broken early.

If you want to look at the Ansible steps I use to duplicate what the mysql_secure_installation script does, they’re available here:

https://github.com/fourkitchens/server-playbooks/blob/master/common-tasks/mysql-secure.yml

There’s a bit of cleverness going on. MySQL starts with a default user of ‘root’ with no password. We set the password for this user, then write a .my.cnf with the new credentials so that subsequent tasks run correctly.

That allows the playbook to be run multiple times idempotently, which it doesn’t look like you can do in your script (you are calling a bash script?)

  • Mark

A few things:

* where possible, use database modules like Mark said -- executing
scripts is not Ansible's preferred way of doing things -- patches to
them welcome
* consider using group_vars and host_vars for inventory versus vars_files
* keep your playbooks in a playbooks/ directory versus in the same
directory as your variable files, where it's difficult to tell what's
what
* use 'notify' to restart services only when config files change (you
have a TODO about this somewhere)
* use the get_url module versus shell executing curl
* re "if ever with_items is a legal parameter in Ansible Play" (vs a
task) ... FYI, it won't be.
* style preference -- include some whitespace before starting your
tasks section and between tasks
* host an apt repo and use the apt module versus shell executing dpkg
(especially with no creates=)
* not sure what the shell script is for, you could probably get away
with not using it and passing in a location to a variable file using
--extra-vars, perhaps

(1) Mark, thanks! This is one of those cases where being a 10-year MySQL DBA hurt rather than helped. I knew precisely the command I’d run and did it, without even considering there was a module to do this properly. Fixed.

(2) Michael, replies in-line:

  • where possible, use database modules like Mark said – executing
    scripts is not Ansible’s preferred way of doing things – patches to
    them welcome

As a MySQL DBA, I need to:

  1. Set up users with permissions on arbitrary databases/tables for various hosts. Ansible=DONE.
  2. Create databases. Ansible=DONE.
  3. Create tables in a database with a set of columns, keys and basic permissions. Ansible=NOT DONE(?).
  4. Slave a machine to a master at an arbitrary binlog name/position using a named user/password. Ansible=NOT DONE(?).
    I think given the Ansible Way, this would mean a new “mysql_table” module and “mysql_replication” module?
  • consider using group_vars and host_vars for inventory versus vars_files

Apologies, I don’t know what you mean here. I don’t put inventory into vars_files, but rather into /etc/ansible/hosts. Do you mean that I’m defining variables in vars_files that would be perfectly capably defined in the inventory file?

  • keep your playbooks in a playbooks/ directory versus in the same
    directory as your variable files, where it’s difficult to tell what’s
    what

Good point. I’ll make this change.

  • use ‘notify’ to restart services only when config files change (you
    have a TODO about this somewhere)

The real problem is I’m doing a fairly complex setup of the my.cnf and then potentially modify it outside Ansible’s realm of control - I’m not sure if I’m simply doing it wrong, or if I really am doing something beyond the basic functionality of Ansible. If I can bring the my.cnf postprocessing directly back into control of Ansible, then it will know if the file has changed or not.

Or is there some mechanism whereby Ansible detects a file has been changed by external scripts? (I can search mailing list for detail)

  • use the get_url module versus shell executing curl

I understand why now people are afraid to OSS their stuff. Very public “duh” moments. :slight_smile:

  • re “if ever with_items is a legal parameter in Ansible Play” (vs a
    task) … FYI, it won’t be.

Thanks, I am removing the comment cruft.

  • style preference – include some whitespace before starting your
    tasks section and between tasks

I think I do this already? Between tasks anyway. Will carefully consider before-tasks.

  • host an apt repo and use the apt module versus shell executing dpkg
    (especially with no creates=)

Unsure if I can foist this overhead onto everyone who wants to try out a distributed database (core project goal: make it as easy to install distributed database as it is today to install single database - if everyone who wanted to try out MySQL had to administer an apt repo to do it, they wouldn’t do it).

If you’re speaking at a fairly meta level, I do agree that this project will need package repos since software hosted out there on the internets can sometimes go 404 without much notice. It’ll take me a few months, so this antipattern needs to remain for now.

If anyone knows how to quickly/easily setup package repos, let me know. I did set up an apt repo initially, but it stopped working after a couple of weeks, so it appears to be something that requires time that right now I can’t invest.

  • not sure what the shell script is for, you could probably get away
    with not using it and passing in a location to a variable file using
    –extra-vars, perhaps

Which shell script?

And now, my own thoughts.

No-one seemed to express horror at my wrapping up multiple playbooks into a shell script. I had thought that would be a common critique, that I should use playbooks-of-playbooks rather than shell-scripts-calling-playbooks.

I need to generate an SSH keypair and install it everywhere. I do it with supporting scripts. This seems like a task that would be required often, and might be a candidate for a module? Ansible supports modifying the authorized_keys file, but not placing the private key, nor generating the keypair to start with. Wondering if the extra functionality is not desired.

I wanted to reference a YAML-defined variable:
mysql:
master_user: user1
As {{ mysql[‘master_user’] }} within a template and as $mysql[‘master_user’] within a playbook, but it didn’t work. I changed the variable to just “mysql_master_user: user1” and then I could use the same variable in templates and playbooks. Is this a Really Hard Problem I’ve stumbled onto?

As a MySQL DBA, I need to:

  1. Set up users with permissions on arbitrary databases/tables for various hosts. Ansible=DONE.
  2. Create databases. Ansible=DONE.
  3. Create tables in a database with a set of columns, keys and basic permissions. Ansible=NOT DONE(?).
  4. Slave a machine to a master at an arbitrary binlog name/position using a named user/password. Ansible=NOT DONE(?).
    I think given the Ansible Way, this would mean a new “mysql_table” module and “mysql_replication” module?

maybe. I think this could be really interesting if we have some amazing and complete module set to manage MySQL.

That being said, my scripts/applications have always “owned” their own table setup via migration and setup scripts and so on.

I’m not familiar with MySQL replication setup – but maybe that is as simple as templating a config file and starting a service? Might not need a module then. I don’t know.

  • consider using group_vars and host_vars for inventory versus vars_files

Apologies, I don’t know what you mean here. I don’t put inventory into vars_files, but rather into /etc/ansible/hosts. Do you mean that I’m defining variables in vars_files that would be perfectly capably defined in the inventory file?

If you have a play that targets a group, and you are doing

vars_files:

  • groupname.yml

Then you can, in the same location as you have /etc/ansible/hosts

Have a YAML file named:

/etc/ansible/group_vars/groupname.yml

which saves you from having to do the vars_files trick(s).

Totally usable in 0.7, but not in 0.6 unless you /also/ have a vars_files section (minor bug)

  • use ‘notify’ to restart services only when config files change (you
    have a TODO about this somewhere)

The real problem is I’m doing a fairly complex setup of the my.cnf and then potentially modify it outside Ansible’s realm of control - I’m not sure if I’m simply doing it wrong, or if I really am doing something beyond the basic functionality of Ansible. If I can bring the my.cnf postprocessing directly back into control of Ansible, then it will know if the file has changed or not.

Ansible’s no different than any other tools in this regard. If you are editing out of band and the user can also edit some parts of it, the “assemble” module may be a good choice.

There is also a line editing module which is not yet in core – because it needs to be ported over to the new module framework and to be made properly idempotent – though I’ll accept inclusion if/when anyone does update it… that being said, I strongly believe in templates and avoiding that wherever possible.

Or is there some mechanism whereby Ansible detects a file has been changed by external scripts? (I can search mailing list for detail)

Yes and no – if a template or copy operation would have changed the file at the end of the module, the notify kicks in. But there’s not a “has the file, for which I am not specifying what it should be, changed since the last time I ran this”.

  • use the get_url module versus shell executing curl

I understand why now people are afraid to OSS their stuff. Very public “duh” moments. :slight_smile:

Nah, hardly…. very hard to drink from the firehose :slight_smile:

\

  • host an apt repo and use the apt module versus shell executing dpkg
    (especially with no creates=)

Unsure if I can foist this overhead onto everyone who wants to try out a distributed database (core project goal: make it as easy to install distributed database as it is today to install single database - if everyone who wanted to try out MySQL had to administer an apt repo to do it, they wouldn’t do it).

Just an idea – I was thinking you could host it, i.e. if I have some ISV software, I can host my own yum repo, and configure people to use it. Was trying to avoid the dpkg command
being executed every time you ran the playbook.

Alternately

dpkg whatever creates=/some/file/that/the/package/will/lay/down

would also work

  • not sure what the shell script is for, you could probably get away
    with not using it and passing in a location to a variable file using
    –extra-vars, perhaps

Which shell script?

Not looking ATM, but there was something in a “SaneAndMinimalSystem” directory somewhere that could have been expressed in modules. Maybe I misunderstood.

And now, my own thoughts.

No-one seemed to express horror at my wrapping up multiple playbooks into a shell script. I had thought that would be a common critique, that I should use playbooks-of-playbooks rather than shell-scripts-calling-playbooks.

I was holding back because I didn’t understand the reason for the script, and kinda suck at reading shell scripts.

I need to generate an SSH keypair and install it everywhere. I do it with supporting scripts. This seems like a task that would be required often, and might be a candidate for a module? Ansible supports modifying the authorized_keys file, but not placing the private key, nor generating the keypair to start with. Wondering if the extra functionality is not desired.

There’s already the authorized_key module, your SSH key generation could happen in a play that only targets 127.0.0.1 and uses the local connection mode, only generating the key if it is not already there, and then you transfer that in subsequent plays?

The shell script is not bad though – (somewhat OT) many people have asked “how do I run make on remote systems” (thinking like someone might do with Fabric). I would generally say that while you can, you don’t… you build it in advance, in Hudson/Jenkins/whatever, and then use Ansible to transfer that content. That is kind of because I just like packages – but more so the general theme that there’s nothing wrong with doing some local steps before you run Ansible.

I wanted to reference a YAML-defined variable:
mysql:
master_user: user1
As {{ mysql[‘master_user’] }} within a template and as $mysql[‘master_user’] within a playbook, but it didn’t work. I changed the variable to just “mysql_master_user: user1” and then I could use the same variable in templates and playbooks. Is this a Really Hard Problem I’ve stumbled onto?

Nah, just something subtle. The shorthand variables don’t really work like Python.

${mysql.master_user} should work fine in the playbook. Presently Jinja2 isn’t allowed in Playbooks (but is in templates), and the shorthand AND Jinja2 are allowed in templates, though I’m thinking about working to turn that back on such that you can use Jinja2 everywhere. In any event, the $shorthand doesn’t use square brackets, it uses “.” for both array and dictionary indexing. The idea being, where possible, I want to avoid things looking like code… (where of course only_if is one place where that doesn’t hold, but I want it to be the only one).

* Time Less <timelessness at gmail.com> [2012/08/29 09:39]:

As a MySQL DBA, I need to:

   1. Set up users with permissions on arbitrary databases/tables for
   various hosts. Ansible=DONE.
   2. Create databases. Ansible=DONE.
   3. Create tables in a database with a set of columns, keys and basic
   permissions. Ansible=NOT DONE(?).
   4. Slave a machine to a master at an arbitrary binlog name/position
   using a named user/password. Ansible=NOT DONE(?).

I think given the Ansible Way, this would mean a new "mysql_table" module
and "mysql_replication" module?

This is a tangent and not directly related to ansible, but could be
construed as a cautionary tale.

About a year ago, I spent a few weeks automating mysql stuff in
puppet, and what I found was that the time and effort required to
correctly encapsulate all the variations of commands ended up being
harder to understand that simply using mysql commands and SQL
scripts, opaque to new hires (especially ones who already knew
MySQL), and, finally, they weren't used often enough to justify the
time I spent on developing the modules and the learning curve
require to use them. But, ultimately, the biggest problem with it
was that it didn't make what was happening any more documented than
it was before, when everything was done by hand, I.e., documentation
still happened out of band and on purpose, by allocating time
specifically to do it. (And before anyone asks, I deal with a goodly
amount of data, including several dozen databases over the 25G mark,
with new databases coming up about every week or so and many ALTERs
during devopment. Our case is not really one of "They just didn't do
enough mysql to justify it.")

The only part I found worthwhile to automate was setting up
replication slaves, because there's a little bit of orchestration
involved (getting the log position on the master and so on) and
because it's less well-understood and easy to get wrong.

I don't mean to discourage anyone from automating things if it seems
to help, but I definitely hit a point of diminishing returns, and
eventually the work became more of a mastering-the-tool exercise
than a production-usable tool.

I don’t mean to discourage anyone from automating things if it seems
to help, but I definitely hit a point of diminishing returns, and
eventually the work became more of a mastering-the-tool exercise
than a production-usable tool.

This is definitely a theme I want to help people avoid – and was exactly the point for creating this :slight_smile:

I’d always leave migration and table setup, especially, up to a package or script.

Though creating a databases and users seems fairly innocuous (especially as you may want a nice way to batch-remove users),
modules for adding tables would be very niche and round-peg-square-hole and not something that I’d encourage.