Hi,
Disclaimer:
This is going to be a very long one - I apologies in advance, but I really need help and my use-case is fairly complex.
I have been fighting this for a few days now, causing me many headaches.
I’ve done an extensive research, just to make sure I am not missing anything, or raising something that has been solved already:
- Going through the documentation extensively
- Going through many related examples
- Going through all the search-results for the word “postgres” in this user-group
My use-case is quite complex, but I don’t think it’s too complex for Ansible to handle (I am using the latest stable version 1.2.2).
I want to have a Master/Standby streaming-replication story between two nodes.
The master should be write-only, and should be storing the data on our SAN via nfs.
The standby should be read-only, and should run on a ramdisk (our database is small enough to fit in memory).
A third node would act as the query-filter/cache using PgPool-II, possibly with a PgPoolAdmin web-app on-top.
In addition, I want to have a cron job that backs-up the database hourly to an sql dump, on another directory on the same nfs mount.
In the bootstrap process, after installing the packages and dependencies and dealing with postgres-configurations, each node should:
- Mount it’s respective file-system (nfs for the master, tmpfs for the standby).
- Wipe-clean the existing data directory, and re-create the entire cluster from scratch on the mounted directory.
- Handle any user/privs for accessing the cluster/databse
- Create the database
- Load-up the latest-saved sql-backup into the newly created database.
After that, the PgPool node would be configured to take over the replication/filtering/caching/fail-over, etc.
I have created 5 roles:
-
A base CentOS role for general node preparation (CentOS 6.3 x64)
-
A PostgreSQL role for installation of PostgreSQL itself, as well as other dependence (PostgreSQL 9.1 x64)
-
A PostgreSQL_Master role, for the master-node configuration
-
A PostgreSQL_Standby role, for the standby-node configuration
-
A PgPool role for the PgPool node - installation and configuration (PgPool-II 3.2 x64)
I tried to use the latest features/syntax, and planning to upload it to a github, to serve as an example for other people to use.
My headaches are as follows:
- The first thing that bothered me, was the variables-syntax - it turns out there is no one syntax that works everywhere, especially for nested/name-spaced variables.
For example, I have defined a nested-value in a group_vars file, as such:
mounts:
nfs:
path: /mnt/db
source: :
tmpfs:
path: /mnt/ram
source: none
options: size=3g
user:
name: postgres
password:
home: /home/{{ user.name }}
pwfile: {{ user.home }}/.pgpass
roles: CREATEDB,CREATEROLE,SUPERUSER
database:
name:
path: /pgsql/9.1/data
backup: /mnt/db/backups/{{ database.name }}.sql
locale: he_IL.UTF-8
encoding: UTF8
template: template0
loadbalancer: $hostvars[‘dbserver-master’].ansible_default_ipv4.address
* The notation is just for hiding our business stuff. Imagine there is actual details there
Then, in a mount action of a task in the postgres role, trying to access the nfs-path value like this, doesn’t work:
- name: Mount on nfs
mount: name=$mounts.nfs.path
src=$mounts.nfs.source
fstype=nfs
state=mounted
But using this notation does work:
- name: Mount on nfs
mount: name={{ mounts.nfs.path }}
src={{ mounts.nfs.source }}
fstype=nfs
state=mounted
However, conversely, within my site.yaml file, trying to use the database-user value for the user-login like this, doesn’t work:
- hosts: dbserver-master
user: {{ user.name }}
sudo: True
roles:- PostgreSQL_Master
But using this notation, does work:
- hosts: dbserver-master
user: $user.name
sudo: True
roles:- PostgreSQL_Master
- My second, and perhaps biggest headache is actually trying to create the cluster and database.
After installing Postgres on, say, a master-node, there is a “postgres” system-user as well as a “postgres” postgres-user in the service (or is it the yet-uninitialized cluster? or is that a postgres-role? not sure…).
Now, since I need to create a cluster, I need to run a “service postgresql-9.1 initdb” command.
After that, I need to manage postgres-user/role-privalages via “postgresql_user” and create a database via “postgresql_db”.
Lastly, I need to run a “psql” shell to inject the latest database sql-backup into the newly-created database on the newly-created cluster.
But using which system-user?
And which postgres-user/role?
And how do I do this without a password-prompt?
I oscillated back and forth from using the postgres user and root user - each has it’s own solutions and problems.
Let’s say I am using the postgres user in my site.yml file when the role is executed (as shown above).
To avoid a password prompt, I am using a “.pgpass” file that I am generating via a template like this:
- name: Create password file for postgres
template: src=.pgpass.j2 dest={{ user.pwfile }} mode=600
< in .pgpass.j2>:
::*:{{ user.name }}:{{ user.password }}
Problem is, “/home/postgres” doesn’t initially exist - the user does exist, but the home-directory doesn’t.
Also, I need to reset both the system-user-postgres’s password, as well as the postgres-user-password to whatever I would inject into the “.pgpass” template, for consistency-sake (it’s already too confusing so I might as well keep is consistent…).
However, trying to create the home-directory using the “user” module like this, didn’t work:
- name: Setting home directory for system-user
user: name={{ user.name }}
password: {{ user.password }}
home={{ user.pwfile }}
generate_ssh_key=yes
I assume that because the user does exist, it bypasses the setting of the home directory and password
(* I have seen that there is an “update_password” parameter, new in 1.3, but it’s not out yet - at least not officially-stable as a yum package which I prefer to use - in any case, it would not solve my problem). I’ve tried to put this user-creation task “before” the postgres-installation task, but that caused other problems…
Then, I create the cluster like this:
- name: Create database cluster
command: service postgresql-9.1 initdb -D {{ mount.nfs.path }}/{{ database.path}} -E {{ database.encoding }} -U {{ user.name }} --pwfile={{ user.pwfile }} --locale={{ database.locale }}
creates={{ mounts.nfs.path }}/{{ database.path }}/postgresql.conf
Which “appears” to work (no error message), but upon looking in the folder, it’s empty…
Then, this seems to work (again, no errors):
- name: Configure postgres-user
sudo_user: postgres
sudo: True
postgresql_user: user={{ user.name }}
password={{ user.password }}
role_attr_flags={{ user.roles }}
Which I would assume would mean that the postgres-user named “postgres” would now work in conjunction with the system-user postgres (which is the one being used to apply this role in site.yml)
But then trying to create the database like this, doesn’t work:
- name: Create the database
postgresql_db: db={{ user.name }}
encoding={{ database.encoding }}
lc_collate={{ database.locale }}
lc_ctype={{ database.locale }}
template={{ database.template }}
login_user={{ user.name }}
login_password={{ user.password }}
It errors with:
msg: unable to connect to database: FATAL: Peer authentication failed for user “postgres”
Then after adding this to this task:
sudo_user: postgres
sudo: True
It does work… Why? I am already logged in with this user dammit! (grrr…)
This whole postgres-user/system-user/play-book-user/task-user confusion is utterly bewildering to me…
BTW, this (in a task) does NOT work:
sudo_user: {{ user.name }}
sudo: True
WHY?
Also, this (in a group_vars file) works:
user:
name: postgres
home: /home/{{ user.name }}
BUT, this doesn’t work (syntax error):
user:
name: postgres
home: /home/{{ user.name }}
pwfile: {{ user.home }}/.pgpass
WHILE this DOES work:
user:
name: postgres
home: /home/{{ user.name }}
pwfile: /home/{{ user.name }}/.pgpass
WHY?
Does the resolving-recursion go only one-level deep?
THEN, by “work” I mean, “doesn’t throw a syntax error”, but when it gets to using the “home” variable, it errors:
recursive loop detected in template string: /home/{{user.name}}
WTF???
But wait, there’s more…
This is silly, but I would have expected it to work, but… Nope… (syntax error):
username: postgres
userhome: /home/{{ username }}
user:
name: {{ username }}
password: $1$SomeSalt$CMlg4vdoDyTnsk1tVSZTT.
home: {{ userhome }}
pwfile: {{ userhome }}/.pgpass
Which
Finally, when I get to injecting the sql via psql, like this:
- name: Restore database from backup
shell: psql {{ database.name }} < {{ database.backup }}
It complains that it can’t execute it - doesn’t have permission - which is odd as according to the flags on the file, any user should be able to execute it…
But then “sometimes” it does execute, and complains that:
stderr: psql: FATAL: role “root” does not exist
Which is double-odd, since I am not even logged-in as root for this playbook, as again - I clearly wrote that It should log-in as system-user “postgres” in the site.yml file:
- hosts: dbserver-master
user: $database.user
sudo: True
roles:- PostgreSQL_Master
Here, again, after adding this to this task:
sudo_user: postgres
sudo: True
It does work…
So, to summarize, at the end my playbook DO work, but I don’t know why… I don’t understand what’s going on…
It’s all guess-work until I find the combination that works, and that doesn’t fly with me.
The combinatorial-equation grows exponentially with any parameter added, and it’s just unprofessional for me to work this way.
Also, as I said, the database-cluster task finishes with no errors, but the cluster has not been created… At least not where I told it to.