Postgres cluster+database bootstraping

Hi,

Disclaimer:
This is going to be a very long one - I apologies in advance, but I really need help and my use-case is fairly complex.

I have been fighting this for a few days now, causing me many headaches.
I’ve done an extensive research, just to make sure I am not missing anything, or raising something that has been solved already:

  • Going through the documentation extensively
  • Going through many related examples
  • Going through all the search-results for the word “postgres” in this user-group

My use-case is quite complex, but I don’t think it’s too complex for Ansible to handle (I am using the latest stable version 1.2.2).

I want to have a Master/Standby streaming-replication story between two nodes.
The master should be write-only, and should be storing the data on our SAN via nfs.
The standby should be read-only, and should run on a ramdisk (our database is small enough to fit in memory).

A third node would act as the query-filter/cache using PgPool-II, possibly with a PgPoolAdmin web-app on-top.

In addition, I want to have a cron job that backs-up the database hourly to an sql dump, on another directory on the same nfs mount.

In the bootstrap process, after installing the packages and dependencies and dealing with postgres-configurations, each node should:

  1. Mount it’s respective file-system (nfs for the master, tmpfs for the standby).
  2. Wipe-clean the existing data directory, and re-create the entire cluster from scratch on the mounted directory.
  3. Handle any user/privs for accessing the cluster/databse
  4. Create the database
  5. Load-up the latest-saved sql-backup into the newly created database.

After that, the PgPool node would be configured to take over the replication/filtering/caching/fail-over, etc.

I have created 5 roles:

  1. A base CentOS role for general node preparation (CentOS 6.3 x64)

  2. A PostgreSQL role for installation of PostgreSQL itself, as well as other dependence (PostgreSQL 9.1 x64)

  3. A PostgreSQL_Master role, for the master-node configuration

  4. A PostgreSQL_Standby role, for the standby-node configuration

  5. A PgPool role for the PgPool node - installation and configuration (PgPool-II 3.2 x64)

I tried to use the latest features/syntax, and planning to upload it to a github, to serve as an example for other people to use.

My headaches are as follows:

  1. The first thing that bothered me, was the variables-syntax - it turns out there is no one syntax that works everywhere, especially for nested/name-spaced variables.
    For example, I have defined a nested-value in a group_vars file, as such:

mounts:
nfs:
path: /mnt/db
source: :
tmpfs:
path: /mnt/ram
source: none
options: size=3g
user:
name: postgres
password:
home: /home/{{ user.name }}
pwfile: {{ user.home }}/.pgpass
roles: CREATEDB,CREATEROLE,SUPERUSER
database:
name:
path: /pgsql/9.1/data
backup: /mnt/db/backups/{{ database.name }}.sql
locale: he_IL.UTF-8
encoding: UTF8
template: template0
loadbalancer: $hostvars[‘dbserver-master’].ansible_default_ipv4.address

* The notation is just for hiding our business stuff. Imagine there is actual details there

Then, in a mount action of a task in the postgres role, trying to access the nfs-path value like this, doesn’t work:

  • name: Mount on nfs
    mount: name=$mounts.nfs.path
    src=$mounts.nfs.source
    fstype=nfs
    state=mounted

But using this notation does work:

  • name: Mount on nfs
    mount: name={{ mounts.nfs.path }}
    src={{ mounts.nfs.source }}
    fstype=nfs
    state=mounted

However, conversely, within my site.yaml file, trying to use the database-user value for the user-login like this, doesn’t work:

  • hosts: dbserver-master
    user: {{ user.name }}
    sudo: True
    roles:
  • PostgreSQL_Master

But using this notation, does work:

  • hosts: dbserver-master
    user: $user.name
    sudo: True
    roles:
  • PostgreSQL_Master
  1. My second, and perhaps biggest headache is actually trying to create the cluster and database.

After installing Postgres on, say, a master-node, there is a “postgres” system-user as well as a “postgres” postgres-user in the service (or is it the yet-uninitialized cluster? or is that a postgres-role? not sure…).

Now, since I need to create a cluster, I need to run a “service postgresql-9.1 initdb” command.
After that, I need to manage postgres-user/role-privalages via “postgresql_user” and create a database via “postgresql_db”.
Lastly, I need to run a “psql” shell to inject the latest database sql-backup into the newly-created database on the newly-created cluster.
But using which system-user?
And which postgres-user/role?
And how do I do this without a password-prompt?
I oscillated back and forth from using the postgres user and root user - each has it’s own solutions and problems.

Let’s say I am using the postgres user in my site.yml file when the role is executed (as shown above).
To avoid a password prompt, I am using a “.pgpass” file that I am generating via a template like this:

  • name: Create password file for postgres
    template: src=.pgpass.j2 dest={{ user.pwfile }} mode=600

< in .pgpass.j2>:

::*:{{ user.name }}:{{ user.password }}

Problem is, “/home/postgres” doesn’t initially exist - the user does exist, but the home-directory doesn’t.
Also, I need to reset both the system-user-postgres’s password, as well as the postgres-user-password to whatever I would inject into the “.pgpass” template, for consistency-sake (it’s already too confusing so I might as well keep is consistent…).
However, trying to create the home-directory using the “user” module like this, didn’t work:

  • name: Setting home directory for system-user
    user: name={{ user.name }}
    password: {{ user.password }}
    home={{ user.pwfile }}
    generate_ssh_key=yes

I assume that because the user does exist, it bypasses the setting of the home directory and password
(* I have seen that there is an “update_password” parameter, new in 1.3, but it’s not out yet - at least not officially-stable as a yum package which I prefer to use - in any case, it would not solve my problem). I’ve tried to put this user-creation task “before” the postgres-installation task, but that caused other problems…

Then, I create the cluster like this:

  • name: Create database cluster
    command: service postgresql-9.1 initdb -D {{ mount.nfs.path }}/{{ database.path}} -E {{ database.encoding }} -U {{ user.name }} --pwfile={{ user.pwfile }} --locale={{ database.locale }}
    creates={{ mounts.nfs.path }}/{{ database.path }}/postgresql.conf

Which “appears” to work (no error message), but upon looking in the folder, it’s empty…

Then, this seems to work (again, no errors):

  • name: Configure postgres-user
    sudo_user: postgres
    sudo: True
    postgresql_user: user={{ user.name }}
    password={{ user.password }}
    role_attr_flags={{ user.roles }}

Which I would assume would mean that the postgres-user named “postgres” would now work in conjunction with the system-user postgres (which is the one being used to apply this role in site.yml)

But then trying to create the database like this, doesn’t work:

  • name: Create the database
    postgresql_db: db={{ user.name }}
    encoding={{ database.encoding }}
    lc_collate={{ database.locale }}
    lc_ctype={{ database.locale }}
    template={{ database.template }}
    login_user={{ user.name }}
    login_password={{ user.password }}

It errors with:

msg: unable to connect to database: FATAL: Peer authentication failed for user “postgres”

Then after adding this to this task:

sudo_user: postgres
sudo: True

It does work… Why? I am already logged in with this user dammit! (grrr…)

This whole postgres-user/system-user/play-book-user/task-user confusion is utterly bewildering to me…

BTW, this (in a task) does NOT work:

sudo_user: {{ user.name }}
sudo: True

WHY?

Also, this (in a group_vars file) works:

user:
name: postgres
home: /home/{{ user.name }}

BUT, this doesn’t work (syntax error):

user:
name: postgres
home: /home/{{ user.name }}
pwfile: {{ user.home }}/.pgpass

WHILE this DOES work:

user:
name: postgres
home: /home/{{ user.name }}
pwfile: /home/{{ user.name }}/.pgpass

WHY?
Does the resolving-recursion go only one-level deep?

THEN, by “work” I mean, “doesn’t throw a syntax error”, but when it gets to using the “home” variable, it errors:

recursive loop detected in template string: /home/{{user.name}}

WTF???
But wait, there’s more…
This is silly, but I would have expected it to work, but… Nope… (syntax error):

username: postgres
userhome: /home/{{ username }}
user:
name: {{ username }}
password: $1$SomeSalt$CMlg4vdoDyTnsk1tVSZTT.
home: {{ userhome }}
pwfile: {{ userhome }}/.pgpass

Which

Finally, when I get to injecting the sql via psql, like this:

  • name: Restore database from backup
    shell: psql {{ database.name }} < {{ database.backup }}

It complains that it can’t execute it - doesn’t have permission - which is odd as according to the flags on the file, any user should be able to execute it…

But then “sometimes” it does execute, and complains that:

stderr: psql: FATAL: role “root” does not exist

Which is double-odd, since I am not even logged-in as root for this playbook, as again - I clearly wrote that It should log-in as system-user “postgres” in the site.yml file:

  • hosts: dbserver-master
    user: $database.user
    sudo: True
    roles:
  • PostgreSQL_Master

Here, again, after adding this to this task:

sudo_user: postgres
sudo: True

It does work…

So, to summarize, at the end my playbook DO work, but I don’t know why… I don’t understand what’s going on…

It’s all guess-work until I find the combination that works, and that doesn’t fly with me.

The combinatorial-equation grows exponentially with any parameter added, and it’s just unprofessional for me to work this way.
Also, as I said, the database-cluster task finishes with no errors, but the cluster has not been created… At least not where I told it to.