Ansible idempotency

Chandra_Mukherjee · September 12, 2024, 12:00pm

I have a use case to execute DDL scripts on a GCP cloud SQL instance using Ansible. Below is the source code of the playbook:

name: Check postgreSQL version
hosts: localhost
gather_facts: no
become: yes

tasks:
- name: Install python3
  apt:
  name: python3
- name: Install pip3
  apt:
  name: python3-pip
- name: Install libpq-dev
  apt:
  name: libpq-dev
- name: Install psycopg2
  apt:
  name: python3-psycopg2
- name: List SQL files to be executed
  find:
  paths: DDL/
  file_type: file
  recurse: yes
  patterns: “*.sql”
  register: files_matched
- name: Connect to SQL instance
  ignore_errors: true
  loop: “{{ files_matched.files|flatten(levels=1) }}”
  loop_control:
  label: “{{ item.path }}”
  community.postgresql.postgresql_script:
  db: “{{ db_name }}”
  login_host: “{{ ansible_host }}”
  login_password: “{{ db_password }}”
  login_user: “{{ db_user }}”
  port: “5432”
  path: “{{ item.path }}”
  encoding: UTF-8

The code is working fine but i want to execute the playbook for only those files which were modified in the git commit. How do i achieve this?

Denney-tech · September 12, 2024, 2:52pm

Please try to format your posts with code blocks to help with readability of your code. Use triple backticks: ``` to start and end a code block. You can also override formatting by explicitly specifying the language on the same line as the first triple backtick.

First, I want to mention that you can simplify your playbook a little bit by consolidating the apt installs to one task:

- name: Check postgreSQL version
  hosts: localhost
  gather_facts: no
  become: yes
  tasks:
    - name: Install required packages
      apt:
        name:
          - python3
          - python3-pip
          - libpq-dev
          - python3-psycopg2

Now, you want to execute this playbook anytime that there’s a new git commit? You need to figure out how to trigger the ansible-playbook with git webhooks. If you’re using AWX, it would be straightforward to simply call a Job Template via git webhook.

Or do you mean that the *.sql scripts are in git, and you only want to run scripts that were recently modified? If that’s the case, then you might be able to use the age parameter in find: to run only scripts under a certain age (less than the frequency you run the playbook in a cronjob). An important mention here is that the age_stamp parameter defaults to mtime (modified time), but also has options for creation time and access time.

Continuing your play:

    - name: List SQL files to be executed
      find:
        paths: DDL/
        file_type: file
        recurse: yes
        patterns: “*.sql”
        age: "-1d"
      register: files_matched

    - name: Connect to SQL instance
      ignore_errors: true
      loop: “{{ files_matched.files|flatten(levels=1) }}”
      loop_control:
        label: “{{ item.path }}”
      community.postgresql.postgresql_script:
        db: “{{ db_name }}”
        login_host: “{{ ansible_host }}”
        login_password: “{{ db_password }}”
        login_user: “{{ db_user }}”
        port: “5432”
        path: “{{ item.path }}”
        encoding: UTF-8
      when: files_matched.files | length > 0

I don’t know how old you want to consider your files, so I went with “less than 1 day old” (negative values are less than that age, positive values are greater than). I also added a when condition to make sure we only try to run the scripts if we find any valid matches.

Chandra_Mukherjee · September 12, 2024, 3:29pm

Hi Danny,
I want to execute the playbook for only those .sql files which are modified in the git commit, not based on file age.

Denney-tech · September 12, 2024, 4:08pm

I understand that, but I’m not sure how to approach explicitly getting files modified in git. What if multiple commits have been made between syncs? Do you only care about what was modified in the most recent commit? Or do you care about all modified files in all commits since the last sync? How do we compile that information sensibly through git? Is it possible? Probably, but that’s beyond my technical expertise.

That said, if a file is modified by git, then it will be modified on disk. So if we go by modified file time, this would mean the same scripts could be run multiple times within a certain period, but it would catch only recently git modified files (regardless of commit). Anything older would be skipped, so you would have some degree of compromise between running absolutely all scripts everytime, and only git modified files.

Denney-tech · September 12, 2024, 4:15pm

And you can shorten the interval too. Can you guarantee that the playbook will run within a few minutes of each git sync? Then narrow the age window to within a few minutes. You’ll need to have some tolerance so that recently modified files are picked up in a large enough window of time to be run, but simultaneously within a narrow enough window that modified files in previous runs aren’t erroneously picked up again.

Chandra_Mukherjee · September 12, 2024, 6:47pm

I will add this playbook in the github actions so it will be immediately executed once the changes are pushed to the branch, however defining the age window will re-execute few of the files if commits are done in a short period of time. I want to ensure the playbook runs only for modified .sql files. Any way to do that?

Denney-tech · September 12, 2024, 7:06pm

If it runs every commit via github actions… then maybe you can do something like this:

    - name: List SQL files to be executed
      shell:
        cmd: git diff-tree HEAD~1 HEAD -r --name-only | grep DDL | grep \.sql$
      register: files_matched
      failed_when: false

    - name: Connect to SQL instance
      ignore_errors: true
      loop: “{{ files_matched.stdout_lines }}”
      community.postgresql.postgresql_script:
        db: “{{ db_name }}”
        login_host: “{{ ansible_host }}”
        login_password: “{{ db_password }}”
        login_user: “{{ db_user }}”
        port: “5432”
        path: “{{ item }}”
        encoding: UTF-8
      when: files_matched.stdout_lines | length > 0

The idea is to get only the changed filenames between the two most recent commits, and filter them just like before. You may need to specify the working directory or otherwise adjust the shell command a bit. I don’t have enough experience with GitHub Actions to know if there’s any gotchas to worry about or if there’s any “magic” that could be leveraged from the GitHub runners to make this better.

system · October 12, 2024, 7:07pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Execute DDL scripts on GCP Cloud SQL instance via Ansible Get Help collections , gcp	5	23	September 10, 2024
Ansible python interpreter Get Help galaxy-ng , ubuntu , git , ansible , python	3	146	September 18, 2024
Gcloud auth login inside playbook Get Help galaxy-ng , playbook , ubuntu , gcp , ansible	0	35	September 14, 2024
Python interpreter problem on Ubuntu 16.04 Get Help ubuntu	8	1154	March 27, 2024
How do we run playbook for only host is added more Get Help	10	374	December 20, 2023

Ansible idempotency

Related topics