Ansible idempotency

I have a use case to execute DDL scripts on a GCP cloud SQL instance using Ansible. Below is the source code of the playbook:

  • name: Check postgreSQL version
    hosts: localhost
    gather_facts: no
    become: yes

    tasks:

    • name: Install python3
      apt:
      name: python3

    • name: Install pip3
      apt:
      name: python3-pip

    • name: Install libpq-dev
      apt:
      name: libpq-dev

    • name: Install psycopg2
      apt:
      name: python3-psycopg2

    • name: List SQL files to be executed
      find:
      paths: DDL/
      file_type: file
      recurse: yes
      patterns: “*.sql”
      register: files_matched

    • name: Connect to SQL instance
      ignore_errors: true
      loop: “{{ files_matched.files|flatten(levels=1) }}”
      loop_control:
      label: “{{ item.path }}”
      community.postgresql.postgresql_script:
      db: “{{ db_name }}”
      login_host: “{{ ansible_host }}”
      login_password: “{{ db_password }}”
      login_user: “{{ db_user }}”
      port: “5432”
      path: “{{ item.path }}”
      encoding: UTF-8

The code is working fine but i want to execute the playbook for only those files which were modified in the git commit. How do i achieve this?

Please try to format your posts with code blocks to help with readability of your code. Use triple backticks: ``` to start and end a code block. You can also override formatting by explicitly specifying the language on the same line as the first triple backtick.


First, I want to mention that you can simplify your playbook a little bit by consolidating the apt installs to one task:

- name: Check postgreSQL version
  hosts: localhost
  gather_facts: no
  become: yes
  tasks:
    - name: Install required packages
      apt:
        name:
          - python3
          - python3-pip
          - libpq-dev
          - python3-psycopg2

Now, you want to execute this playbook anytime that there’s a new git commit? You need to figure out how to trigger the ansible-playbook with git webhooks. If you’re using AWX, it would be straightforward to simply call a Job Template via git webhook.

Or do you mean that the *.sql scripts are in git, and you only want to run scripts that were recently modified? If that’s the case, then you might be able to use the age parameter in find: to run only scripts under a certain age (less than the frequency you run the playbook in a cronjob). An important mention here is that the age_stamp parameter defaults to mtime (modified time), but also has options for creation time and access time.

Continuing your play:

    - name: List SQL files to be executed
      find:
        paths: DDL/
        file_type: file
        recurse: yes
        patterns: “*.sql”
        age: "-1d"
      register: files_matched

    - name: Connect to SQL instance
      ignore_errors: true
      loop: “{{ files_matched.files|flatten(levels=1) }}”
      loop_control:
        label: “{{ item.path }}”
      community.postgresql.postgresql_script:
        db: “{{ db_name }}”
        login_host: “{{ ansible_host }}”
        login_password: “{{ db_password }}”
        login_user: “{{ db_user }}”
        port: “5432”
        path: “{{ item.path }}”
        encoding: UTF-8
      when: files_matched.files | length > 0

I don’t know how old you want to consider your files, so I went with “less than 1 day old” (negative values are less than that age, positive values are greater than). I also added a when condition to make sure we only try to run the scripts if we find any valid matches.

Hi Danny,
I want to execute the playbook for only those .sql files which are modified in the git commit, not based on file age.

I understand that, but I’m not sure how to approach explicitly getting files modified in git. What if multiple commits have been made between syncs? Do you only care about what was modified in the most recent commit? Or do you care about all modified files in all commits since the last sync? How do we compile that information sensibly through git? Is it possible? Probably, but that’s beyond my technical expertise.

That said, if a file is modified by git, then it will be modified on disk. So if we go by modified file time, this would mean the same scripts could be run multiple times within a certain period, but it would catch only recently git modified files (regardless of commit). Anything older would be skipped, so you would have some degree of compromise between running absolutely all scripts everytime, and only git modified files.

And you can shorten the interval too. Can you guarantee that the playbook will run within a few minutes of each git sync? Then narrow the age window to within a few minutes. You’ll need to have some tolerance so that recently modified files are picked up in a large enough window of time to be run, but simultaneously within a narrow enough window that modified files in previous runs aren’t erroneously picked up again.

I will add this playbook in the github actions so it will be immediately executed once the changes are pushed to the branch, however defining the age window will re-execute few of the files if commits are done in a short period of time. I want to ensure the playbook runs only for modified .sql files. Any way to do that?

If it runs every commit via github actions… then maybe you can do something like this:

    - name: List SQL files to be executed
      shell:
        cmd: git diff-tree HEAD~1 HEAD -r --name-only | grep DDL | grep \.sql$
      register: files_matched
      failed_when: false

    - name: Connect to SQL instance
      ignore_errors: true
      loop: “{{ files_matched.stdout_lines }}”
      community.postgresql.postgresql_script:
        db: “{{ db_name }}”
        login_host: “{{ ansible_host }}”
        login_password: “{{ db_password }}”
        login_user: “{{ db_user }}”
        port: “5432”
        path: “{{ item }}”
        encoding: UTF-8
      when: files_matched.stdout_lines | length > 0

The idea is to get only the changed filenames between the two most recent commits, and filter them just like before. You may need to specify the working directory or otherwise adjust the shell command a bit. I don’t have enough experience with GitHub Actions to know if there’s any gotchas to worry about or if there’s any “magic” that could be leveraged from the GitHub runners to make this better.