Progress bars

Hi,

I just wanted to bring up progress bars again and monitoring of long running tasks.

There is something approaching 50 +1’s to the bug report / request https://github.com/ansible/ansible/issues/3887

It would be really nice to have progress bars at various levels. When running a playbook and you are not debugging it, it would be very nice to have the option to be able to instead of getting screenfuls of output to instead have a progress bar, much like git or mercurial do (and details would just be sent to some log file). Eg to see a video of the kind of progress indicator I am talking about see: https://github.com/noamraph/tqdm

On the other extreme it would be nice to have progress indicators / feedback mechanisms for some of the really long running tasks to figure out if they have hung or not. I don’t know any of the details of the internals of ansible or how to make this work, but from the bug report this is obviously a very highly desired feature which many people want…

Other threads about this: https://groups.google.com/forum/#!msg/ansible-devel/cvbyUMspqE0/DMLaYR8DfUoJ

Thanks,
Jason

Thanks, I’ve locked the topic to save folks from trying to comment on it, as we don’t see those comments.

This isn’t really possible in any sort of clean way and we’ve explained why already on several occasions.

See docs on “async” for how to report if things are still running for long running tasks, as well as how to set timeouts.

Thanks, I’ve locked the topic to save folks from trying to comment on it, as we don’t see those comments.

Well, at least it was a place where people could feel they could leave some sort of feedback and feel they could say “Yes… we really really want this feature…” I guess we just bring this up periodically in the forums then?

This isn’t really possible in any sort of clean way and we’ve explained why already on several occasions.

Well surely at the top level having a progress bar for the whole playbook is possible? There is nothing technical stopping that. Right?

And surely when we have async tasks right now (which I use) instead of just printing down the screen right now something like:

TASK: [bitnami_rubystack | Install rubystack] *********************************

<job 409801362883.2948> polling, 990s remaining

<job 409801362883.2948> polling, 980s remaining

<job 409801362883.2948> polling, 970s remaining

<job 409801362883.2948> polling, 960s remaining

<job 409801362883.2948> polling, 950s remaining

<job 409801362883.2948> polling, 940s remaining

<job 409801362883.2948> polling, 930s remaining

<job 409801362883.2948> polling, 920s remaining

It could have a better output where there is just a single line which is counting down?

Also with the async process can’t the last line of the current task be captured and displayed? and just written over each time? sort of like periodically doing the async connect and grabbing 'tail --lines 1 progresslog" and displaying that?

I don’t know the details but there must be something that can be done! and it is very very clear that users want something like this. I wanted it from practically the moment I started using ansible and it is likely my top issue with ansible.

(In any case thanks for ansible, I really like most of it!)

Cheers,
Jas

Something like "Play (7 of 12) Task (4 of 15)". There's no way to have any kind of time estimate, but a simple X of Y count seems useful.

No, bringing it up periodically can’t really help it make it be true any better.

Please read my last comment I made in the ticket.

The most important point is point #1 – nearly all of the modules involved have absolutely no API or notion of what percentage complete they are, but the other points are equally important.

For completeness in my thread, here’s my full reply to that ticket:

Hi everyone,

I’ve closed this ticket as I think many people understand what is going on here and why we have acted on it.

These reasons have been enumerated previously and include:

1) Ansible contains 235+ modules and the APIs used in basically all of these modules don’t have any capacity for reporting status in the middle of an async operation

  1. To do this properly for N-node management it implies setting up a server and a crypto layer that eliminates much of the architectural elegance of ansible

  2. If you just need to know if a module is still running, async is available -http://docs.ansible.com/playbooks_async.html

  3. The CLI has no good way to output standard out changes when running in parallel against several hundred hosts

  4. The performance implications of doing this over an additional SSH channel or other mechanism are decidedly non-trivial

It’s not that we are denying anyone this out of spite, it’s actually not a good fit for the system.

We strongly encourage usage of async to show long running tasks are alive. Consider having custom modules, if any, log their actions, to diagnose problems should they occur.

Ansible does in fact report in when each host comes back in a large set of hosts, so that progress is definitely available.

Thanks for your understanding.

So that comments are easy to read I am also removing the various (and numerous) +1s on this file. I appreciate the feedback, but voting cannot make it so.

Hi Adam,

the idea of numbering the plays and tasks is different than the indeterminate status of the tasks.

It’s also possible to number what host number is coming back pretty easily, theoretically.

Play 1/45: ASDF

Task 5/10: JKL

Host [35%]: asdf.example.com => result

etc

Those kinds of things are more available to the system and possible.

No, bringing it up periodically can’t really help it make it be true any better.

Well I work professionally as a developer (like I am sure the vast majority of the people on this list) and I can say the weight of user pressure definitely changes our priorities at times.

Please read my last comment I made in the ticket.

I did. Thanks for taking the time to respond about this. And sorry if my responses seem persistent. I feel passionately about this point, and unfortunately right now the response I am perceiving is “yep lots of people want this, but we aren’t / can’t do anything about it… Please let this go…” (But you have framed it in a very nice way :slight_smile: ) So naturally I am questioning around the edges of is there anything that can mitigate this to at least some degree…

As for removing the +1’s it sort of feels to me like I / we are loosing our say in the community. It really wasn’t that hard to scroll past a long list of +1’s, and in fact I was very very happy to see all the other +1’s. it made me as a user realise, “nope it’s not just me, hey everyone else really really wants this feature as well…” Ie the ansible community really wants this, I should bring this up again…

Of course it is your prerogative, and there are development schedules and all sorts of things to consider, and maybe this clashes with your commercial offering (i.e. from the press release: “Tower’s real-time output of Ansible Playbooks includes dynamic progress bars that show you the status of jobs and hosts, along with the overall status of the Playbook run…”). (Hey… you guys have to make money somehow :slight_smile: etc…) There are all sorts of considerations I guess which I / we are not aware of but still… it feels… well… I’ll just say I liked seeing the plus ones…

Anyway, onto the actual points…

The most important point is point #1 – nearly all of the modules involved have absolutely no API or notion of what percentage complete they are, but the other points are equally important.

So just a query here: on the remote machine the python program that is running, is it true that the running process fundamentally can’t get the output of command that is running until the end? For simplicity let’s take apt-get. As it runs it normally spits out information. Does the running program on the host not have access to this information as it is being spit out? Can’t it for instance periodically store the current last line of this output in a file somewhere? Then can’t an async connection just query this last line and write that out to the screen i.e…

instead of the async task producing:

<job 409801362883.2948> polling, 990s remaining
<job 409801362883.2948> polling, 980s remaining
<job 409801362883.2948> polling, 970s remaining
<job 409801362883.2948> polling, 960s remaining
<job 409801362883.2948> polling, 950s remaining

There could be an option so it prints:

<job 409801362883.2948> polling, 980s remaining, current: (Reading database … 84711 files and directories currently installed.)
<job 409801362883.2948> polling, 970s remaining, current: Selecting previously unselected package libmono-2.0-dev.
<job 409801362883.2948> polling, 960s remaining, current: Selecting previously unselected package libmono-system-xml4.0-cil.
<job 409801362883.2948> polling, 950s remaining, current: N: Ignoring file ‘opera.list.save’ in directory ‘/etc/apt/sources.list.d/’ as it has an invalid filename extension
<job 409801362883.2948> polling, 940s remaining, current: Reading state information… Done

etc. Is this not possible?

(Even better would be to only have a single line of output where the last line is removed as soon as a new line is produced… so you see a single line constantly being updated at the refresh frequency of the async poll…)

So I don’t think each module actually needs to know what percentage complete they are!

In general it is not even possible to know how long and arbitrary task is going to take (some tasks like copying a file, transferring something, or and apt task (which does have progress bars) would of course naturally have this information available but at a first cut it wouldn’t need to be used…)

Of course adding it into the API so that future modules would be able to do this would be a great step as well…

  1. I don’t understand why an sync task which would connect and basically just be a “scp host /path/to/current/last/line/of/process/output” has to have a server & crypto layer, etc. There is already an async process that periodically sees if the task is finished.

  2. I am suggesting this is basically just progress bars for async connections. Ie the users writing the scripts decide when to do an async operation, and they decide if the async process also grabs the last line of output at the same time…

  3. Well there would have to be some summarisation made of course but something like the following print print multiple progress lines:

#!/usr/bin/env python
import sys
import time
import random

taskCount = 10
taskStrings = [“task “+str(i) +”:” for i in range(taskCount)]

for i in range(20):
for ts in range(taskCount):
taskStrings[ts] += “.” * random.randint(0,1)
print taskStrings[ts]
sys.stdout.write(“\033[F”*taskCount) # Cursor up taskCount lines
time.sleep(0.3)

print ((" “* 30) + “\n”) * taskCount
sys.stdout.write(”\033[F"*(taskCount+1)) # Cursor up taskCount lines

  1. Isn’t this the exact performance characteristics of async. How does it differ?

Again, sorry for the persistence. This is a feature that I think would make ansible much better…

Thanks, for your consideration.
Jason

Of course it is your prerogative, and there are development schedules and
all sorts of things to consider, and maybe this clashes with your
commercial offering (i.e. from the press release: "Tower's real-time
output of Ansible Playbooks includes dynamic progress bars that show you
the status of jobs and hosts, along with the overall status of the Playbook
run...").

Please don't accuse us of holding back from ansible, this has never been
the case.

Tower's feature is the playbook output streams in via websockets *just like
it does from /usr/bin/ansible-playbook* so you don't have to hit "F5" to
reload the page.

That's it. CLI equivalence.

In some ways it provides some cleaner ways to dive through the final
output, because a web interface is superior for some of these kinds of
searches, but it is using clean-stock-ansible underneath, without
modifications.

The progress bar you get is the number of hosts having checked in yet, not
the intermediate status of invididual hosts. Which in reply to Adam
above, we've indicated we'd be totally happy to see in the CLI output too.

Yes, it's formatted differently. Do modules return different information
in Tower or does Tower have proprietary modules? Absolutely not.

It doesn't have anything to do with intermediate status from modules.

instead of the async task producing:

<job 409801362883.2948> polling, 990s remaining
<job 409801362883.2948> polling, 980s remaining
<job 409801362883.2948> polling, 970s remaining
<job 409801362883.2948> polling, 960s remaining
<job 409801362883.2948> polling, 950s remaining

There could be an option so it prints:

<job 409801362883.2948> polling, 980s remaining, current: (Reading
database ... 84711 files and directories currently installed.)
<job 409801362883.2948> polling, 970s remaining, current: Selecting
previously unselected package libmono-2.0-dev.
<job 409801362883.2948> polling, 960s remaining, current: Selecting
previously unselected package libmono-system-xml4.0-cil.
<job 409801362883.2948> polling, 950s remaining, current: N: Ignoring file
'opera.list.save' in directory '/etc/apt/sources.list.d/' as it has an
invalid filename extension
<job 409801362883.2948> polling, 940s remaining, current: Reading state
information... Done

etc. Is this not possible?

Doesn't really matter, it's also not what you want. You think it is, but
it's not.

Imagine you have 500 hosts. Do you want just the last line at every poll
attempt? What if you want the log runs? What if you wanted the last 10
lines? What's the right behavior in every single module to do this? Do
you want to interleave output from all 500 hosts?

The last line of output at the moment you poll is a poor solution, and is
subject to some relatively bad sampling problems, that will almost always
miss the information you want. Including every line since last poll could
stream several megabytes per host.

Also then consider all of the modules that are API driven and don't have
any intermediate status to provide, for which the apt module is *also* an
example of this, as, are, for example EC2 modules.

So, while trivially you could say "shell output since last run", that's a
hack for just a small subset of the modules, which does nothing for package
installs, cloud provisioning, fdisk, or many other things that would be
much more benefitting of status than simple shell commands.

The apt example, in particular, leverages python-apt in many cases, so it's
very important to remember many of the APIs we are using are not written to
provide asynchronous status.

It's also important to realize that modules don't know when they are
running in async mode - sure, it might be possible to augment the async
system, but ultimately, it's a wrapper process that waits for a JSON
response.

So what you ask isn't trivial by any means - to present, to retrofit the
modules, to retrofit async, etc.

The shell module appears to be the easy case, but it exposes it's own
manner of complexity.

Progress bars are 100% right out.

Again, sorry for the persistence. This is a feature that I think would
make ansible much better...

Many things would make ansible much better.

Of course it is your prerogative, and there are development schedules and all sorts of things to consider, and maybe this clashes with your commercial offering (i.e. from the press release: “Tower’s real-time output of Ansible Playbooks includes dynamic progress bars that show you the status of jobs and hosts, along with the overall status of the Playbook run…”).

Please don’t accuse us of holding back from ansible, this has never been the case.

Ok.

Tower’s feature is the playbook output streams in via websockets just like it does from /usr/bin/ansible-playbook so you don’t have to hit “F5” to reload the page.

That’s it. CLI equivalence.

In some ways it provides some cleaner ways to dive through the final output, because a web interface is superior for some of these kinds of searches, but it is using clean-stock-ansible underneath, without modifications.

The progress bar you get is the number of hosts having checked in yet, not the intermediate status of invididual hosts. Which in reply to Adam above, we’ve indicated we’d be totally happy to see in the CLI output too.

Yes, it’s formatted differently. Do modules return different information in Tower or does Tower have proprietary modules? Absolutely not.

It doesn’t have anything to do with intermediate status from modules.

Right in my original email this was a progress bar at the top level. I was thinking a playbook has something like 50 plays in it, I would like to see a progress bar of 1 through 50 multiplied by the number of hosts…

instead of the async task producing:

<job 409801362883.2948> polling, 990s remaining
<job 409801362883.2948> polling, 980s remaining
<job 409801362883.2948> polling, 970s remaining
<job 409801362883.2948> polling, 960s remaining
<job 409801362883.2948> polling, 950s remaining

There could be an option so it prints:

<job 409801362883.2948> polling, 980s remaining, current: (Reading database … 84711 files and directories currently installed.)
<job 409801362883.2948> polling, 970s remaining, current: Selecting previously unselected package libmono-2.0-dev.
<job 409801362883.2948> polling, 960s remaining, current: Selecting previously unselected package libmono-system-xml4.0-cil.
<job 409801362883.2948> polling, 950s remaining, current: N: Ignoring file ‘opera.list.save’ in directory ‘/etc/apt/sources.list.d/’ as it has an invalid filename extension
<job 409801362883.2948> polling, 940s remaining, current: Reading state information… Done

etc. Is this not possible?

Doesn’t really matter, it’s also not what you want. You think it is, but it’s not.

No. It really is what I want. I want it so that it prints a single line and then it overwrites that same line at the next poll. I would then likely set the poll frequency fairly low so I could see if the process is hung. Eg if you look at e.g. some virus checkers, they print the file they are checking. These files fly by so fast you can’t even really read them. But if it ever stops then you know the thing is hung up a bit. That is the principal I am talking about here…

Imagine you have 500 hosts. Do you want just the last line at every poll attempt? What if you want the log runs? What if you wanted the last 10 lines? What’s the right behavior in every single module to do this? Do you want to interleave output from all 500 hosts?

You could make it a parameter if you are really interested. But a single line which is constantly overwritten would give the user enough feedback…

The last line of output at the moment you poll is a poor solution, and is subject to some relatively bad sampling problems, that will almost always miss the information you want.

It is not meant to be a log. It is meant to tell you if the process is still going…


So what you ask isn’t trivial by any means - to present, to retrofit the modules, to retrofit async, etc.

The shell module appears to be the easy case, but it exposes it’s own manner of complexity.

Progress bars are 100% right out.

Well… that is unfortunate, it is a feature many people clearly want… Ohh well… Thanks for answering the emails.

Cheers,
Jason

Right in my original email this was a progress bar at the top level. I was
thinking a playbook has something like 50 plays in it, I would like to see
a progress bar of 1 through 50 multiplied by the number of hosts...

Right, some lines from the latest command would not quite be a progress
bar, so this is a different thing.

We've mentioned why we don't have percentage completion info.

instead of the async task producing:

<job 409801362883.2948> polling, 990s remaining
<job 409801362883.2948> polling, 980s remaining
<job 409801362883.2948> polling, 970s remaining
<job 409801362883.2948> polling, 960s remaining
<job 409801362883.2948> polling, 950s remaining

There could be an option so it prints:

<job 409801362883.2948> polling, 980s remaining, current: (Reading
database ... 84711 files and directories currently installed.)
<job 409801362883.2948> polling, 970s remaining, current: Selecting
previously unselected package libmono-2.0-dev.
<job 409801362883.2948> polling, 960s remaining, current: Selecting
previously unselected package libmono-system-xml4.0-cil.
<job 409801362883.2948> polling, 950s remaining, current: N: Ignoring
file 'opera.list.save' in directory '/etc/apt/sources.list.d/' as it has an
invalid filename extension
<job 409801362883.2948> polling, 940s remaining, current: Reading state
information... Done

etc. Is this not possible?

Doesn't really matter, it's also not what you want. You think it is, but
it's not.

No. It really is what I want. I want it so that it prints a single line
and then it overwrites that *same* line at the next poll. I would then
likely set the poll frequency fairly low so I could see if the process is
hung. Eg if you look at e.g. some virus checkers, they print the file they
are checking. These files fly by so fast you can't even really read them.
But if it ever stops then you know the thing is hung up a bit. That is the
principal I am talking about here...

In general, getting into "hung" commands is making sure things don't go
interactive.

I understand the general nature of that particular idea - which could
*perhaps* be solved in a few other ways for command/shell modules, noting
that it got *some* output since the last poll, but perhaps not sharing what
that output is.

If it just output "no new output" that would be a bit easier.

Imagine you have 500 hosts. Do you want just the last line at every poll

attempt? What if you want the log runs? What if you wanted the last 10
lines? What's the right behavior in every single module to do this? Do
you want to interleave output from all 500 hosts?

You could make it a parameter if you are really interested. But a single
line which is constantly overwritten would give the user enough feedback...

The last line of output at the moment you poll is a poor solution, and is
subject to some relatively bad sampling problems, that will almost always
miss the information you want.

It is not meant to be a log. It is meant to tell you if the process is
still going...

...
So what you ask isn't trivial by any means - to present, to retrofit the
modules, to retrofit async, etc.

The shell module appears to be the easy case, but it exposes it's own
manner of complexity.

Progress bars are 100% right out.

Well... that is unfortunate, it is a feature many people clearly want...
Ohh well... Thanks for answering the emails.

Different people want it in very different aspects, which is part of the
problem and confusion :slight_smile: