Hi all,
I'm working on a guestfs [1] connection plugin and seeking for a design
advice.
libguestfs provides a set of command line tools that can be used to
operate on virtual machine disk images and modify its contents.
For every task, the connection plugin:
1. Starts guestfish in --remote mode on a remote host over ssh and adds
a disk (passed as a parameter to guestfs connection).
2. Runs supermin applicance [2][3]. It typically takes two to four
seconds to spin up the applicance VM.
3. Mounts root fs partition (partition number is passed as a parameter to the
guestfs connection)
4. Performs the task:
Some implementation details:
- put_file/fetch_file is implemented using copy-in/copy-out [4][5]
guestfish commands
- there's intermediate copy to/from remote host over ssh (to enable
remote guestfs operation)
- exec_command is implemented using "command" [6] guestfish command
5. Stops supermin appliance/a guestfish instance
Here's an example how it looks in a playbook:
- name: Add disk image to inventory
add_host:
name: "{{ vm_disk_path }}"
ansible_host: "{{ ansible_host }}"
ansible_connection: guestfs
ansible_guestfs_disk_path: "{{ vm_disk_path }}"
ansible_guestfs_root_partnum: "{{ root_partnum }}"
changed_when: false
- name: Test guestfs
ping:
delegate_to: "{{ vm_disk_path }}"
The ping command is performed using the execution environment from
within the disk image on remote host:
TASK [Add disk image to inventory] ******************************************
ok: [remote-hypervisor]
TASK [Test guestfs] *********************************************************
ok: [remote-hypervisor -> /home/user/test.qcow2]
Likewise, a role can be delegated to the guestfs disk image.
The problem is that _connect() spins up supermin VM on every task and
stops afterwards. So, it takes at least two seconds only to perform
_connect(). Obviously it's very slow for plays with a lot of tasks and
roles.
The question is how it can be optimized to avoid costly _connect caused
by appliance start?
I think of the following approaches:
1. Introduce a separate module that starts up or stops guestfs appliance
and remove the action from the connection plugin
Pros: similar to ldx, docker, virt connections that have separate
tasks for start/stop of the conntainers/VMs
Cons: extra tasks need to be added for every play to start/stop guestfish
2. Add a separate meta task that closes the connection and a connection
flag that effectively doesn't stop guestfish after the first task
The meta task 'close_connection' can either be added as a separate
module or as an extension to builtin meta module.
Cons:
- it looks flaky - guestfish might be unintentionally left running
somewhere in the middle of the play in case of an error. Extra
care (i.e. blocks) might be needed to always close guestfs
connection.
3. Extend persistent connection framework [7]. There might be new mode
that keeps connection open for a sequence of tasks running on the
same connection without an explicit timeout. So this mode looks like
this:
task 1 on a guestfs connection - implicit _connect
task 2 on the same guestfs connection - no _connect
...
task n on the same guestfs connection - no _connect
task z on any other connection or the end of play - implicit close()
of the guestfs connection
Pros: reliable, tidy - no need of extra tasks/blocks
Cons:
- need to modify ansible core - task_executor, etc
- not sure if ansible is able persist connections across the roles
Looking forward to a feedback on what of the approaches is the most
solid/sane.
1. https://libguestfs.org
2. https://libguestfs.org/guestfs-internals.1.html#architecture
3. https://libguestfs.org/supermin.1.html
4. https://libguestfs.org/guestfish.1.html#copy-in
5. https://libguestfs.org/guestfish.1.html#copy-out
6. https://libguestfs.org/guestfish.1.html#command
7. https://www.ansible.com/deep-dive-with-network-connection-plugins
Thanks,
Roman