ssh performance - follow up

Ok great, didn’t try yet :slight_smile:

– Michael

Hello,

looking for a solution on the logic for deciding what calls should be optimized or not, I have a few questions / suggestions :

(warning: this is a bit long)

A/ action & modules composability

From what I understand, :
(1) - an action_plugin is allowed to invoke remote modules via runner._execute_module
(2) - an action_plugin is allowed to call another action_plugin via handler.run (only one case seen in script: which calls ‘raw’)
(3) - the connection_plugin accelerate calls _execute_module to launch the accelerate server

  • do you consider that case (2) and (3) are sufficiently degenerate as to be very rare ?
  • do you consider that case (2) is “legal” or should handler.run only be called by the runner and script: modified ?

B/ creation and need for a temporary path

The creation of a tmp path is decided in the runner just before handler.run (both normal & until case)
It is not created if handler.NEEDS_TMPPATH is set to False by the developer of the action plugin.
a tmp path is also created by accelerate I did not look into this.

except in the “accelerate” case, the tmp path is necessary :
(a) - if an action_plugin know that it will need to send tmp information to the remote host
(b) - if a module needs to work in an ansible tmp dir (copy:?)
(c) - when ansible wants to send the code of a module for execution

currently, NEEDS_PATH defaults to yes without distinction between (a) (b) and (c)
the ssh_alt branch show that (c) does not depend on the action_plugin but on the capabilities of the connection_plugin.

so we could write :

if getattr(handler, ‘NEEDS_TMPPATH’, True) or not conn.has_pipelining :
tmp = self._make_tmp_path(conn)

and expect that in the long run, NEEDS_PATH express the real applicative need for a tmp path not taking into account the transfer of the module.

then we could work on a module by module basis to set the real applicative value of NEEDS_TMPPATH

the only problem I have with this is that all operations on the tmp dir (chmod, removal…) should be guarded by the same test.
This is a problem since runner._execute_module does not have access to the original handler where NEEDS_PATH is stored.

The removal of the tmp path is also done in _execute_module so this is basically the same problem: how can we share this property of the action_plugin up down to the execution of _execute_module.

I could suggest we modify the prototype of action_plugin.run to add (has_remote_tmp = False)
so when calling handler.run, the runner could pass in this variable its final decision to create or not the tmp path.

further calls to handler.run or _execute_module should forward this value along with the current persist_files.

if action modules have the right to call other action modules, we also need the “persist_files” in ActionModule.run (cf raw call from script)

so my suggestion :

A] add a new function to ActionModule: def needs_remote_path(conn) where an ActionModule can express its applicative need for a tmp path.

I transform NEEDS_PATH into a function because depending on the capabilities of the connection, the ActionModule may or may not need a tmp path. This is for example true for the script: action.

B] change the guard around _make_tmp_path :

has_remote_tmp = False

if handler.needs_remote_path(conn) or not conn.has_pipelining :
tmp = self._make_tmp_path(conn)

has_remote_tmp = True

B] change prototypes of action_plugin.run and runner._execute_module

ActionModule:
def run(self, conn, tmp, module_name, module_args, inject, complex_args=None, **kwargs):

becomes
def run(self, conn, tmp, module_name, module_args, inject, complex_args=None, has_remote_tmp=False, keep_remote_tmp=True, **kwargs):

runner :

def _execute_module(self, conn, tmp, module_name, args,
async_jid=None, async_module=None, async_limit=None, inject=None, persist_files=False, complex_args=None)

becomes

def _execute_module(self, conn, tmp, module_name, args,
async_jid=None, async_module=None, async_limit=None, inject=None, has_remote_tmp=False, persist_files=False, complex_args=None)

and change the calls to these functions so that has_remote_tmp flows along the chain of calls.

I hope I am not flooding you with these considerations but I don’t want to start changing prototypes of these things without your approval !

Tell me when you have tried the branch !
Jerome

Yeah we want to be careful about changing prototypes of plugins as users may have written their own that are not in core.

I believe you can assume you can optimize if:

(A) the module is not an action plugin itself
(B) the module is a new style python module

Thus it seems like this is a hint runner provides when calling the connection.

Even still, it should probably utilize kwargs.get(‘persist_files’, None) to make only the new plugin have to change.

Sort of make sense?

Ok i understand about not modifying prototypes. I forgot about 3rd party plugins.

Can i rely on the fact that tmp==‘’ in the call flow is equivalent to the fact that no remote tmp path was created for the handler ?

Regarding NEEDS_TMPPATH versus a new handler.needs_remote_tmp(conn) are you ok for the new function if i call It only when It exists on the handler and default to something like your algo otherwise ?

It would seem cleaner to me and i don’t think that the runner gives me the module_type before the handler was ever called(?) - the decision to create the tmp dir is just before the call to handler.run. I might have the fact that the action called is “normal” to decide that It is a pure module without actions.

Do you confirm !=new modules always need a tmp dir because their arguments are sent in a tmp file ?

I’ll wait for your answer and prepare a patch. Do you prefer the branch squashed or additional commits ?

Jerome