"Why not sqlite?"

So, got asked on twitter why ansible-commander isn't using sqlite.

Several reasons.

(A) At some point, ansible callbacks when using commander are going
to be writing a large amount of data in parallel to the database --
including possible activity remotely sent in via machines running
ansible-pull for those running at more massive scales. I do not
desire the database to block for insertions. This is somewhat minor,
everything adds up. These will include the latest facts per system,
the status of the last playbook per system, etc. Callbacks will be
expanded over time to inject a fair amount of useful data into ansible
commander.

(B) At some point in the future we (or a particular user) may decide
we need a "real" database, at this point, I do not want to bother
about cross database migration scripts. I'd rather/start/ with the
database now and save users the hassle of migration later.

(C) Maintaining schemas for multiple databases and testing them adds
development cost that I don't want to deal with at this point in time.
  That may come later if there is sufficient demand, but right now,
it's a distraction.

(D) Seeing we have a seperate database all setup via
ansible-playbook, I don't really see it imposing a lot of additional
setup steps. It is largely automated and you're only going to have
to set it up once. It should take 30 minutes tops, and can easily be
done in less than 10.

(E) PostgreSQL is likely to be more acceptable in larger corporate
environments than sqlite, since while it is an AWESOME library, it is
generally not something you see behind larger web applications, and
this could discourage users.

(F) The big one -- Access control. It is important that multiple
users be able to share access to ansible, but /not/ have free reign on
the entire database. For instance, it may be possible to give
someone access to manage what their SSH keys can manage, but *NOT*
give them access to edit inventory. Since the inventory plugin must
be run by those users, they must at least have read access on the
database but not write access. Many users would like to control
access for users via things like LDAP, AD, etc -- and would not want
to have to enforce this via filesystem ACLs and permissions. This is
the primary reason we are doing this.

--Michael

So, got asked on twitter why ansible-commander isn't using sqlite.

Several reasons.

(A) At some point, ansible callbacks when using commander are going
to be writing a large amount of data in parallel to the database --
including possible activity remotely sent in via machines running
ansible-pull for those running at more massive scales. I do not
desire the database to block for insertions. This is somewhat minor,
everything adds up. These will include the latest facts per system,
the status of the last playbook per system, etc. Callbacks will be
expanded over time to inject a fair amount of useful data into ansible
commander.

(B) At some point in the future we (or a particular user) may decide
we need a "real" database, at this point, I do not want to bother
about cross database migration scripts. I'd rather/start/ with the
database now and save users the hassle of migration later.

(C) Maintaining schemas for multiple databases and testing them adds
development cost that I don't want to deal with at this point in time.
  That may come later if there is sufficient demand, but right now,
it's a distraction.

(D) Seeing we have a seperate database all setup via
ansible-playbook, I don't really see it imposing a lot of additional
setup steps. It is largely automated and you're only going to have
to set it up once. It should take 30 minutes tops, and can easily be
done in less than 10.

(E) PostgreSQL is likely to be more acceptable in larger corporate
environments than sqlite, since while it is an AWESOME library, it is
generally not something you see behind larger web applications, and
this could discourage users.

(F) The big one -- Access control. It is important that multiple
users be able to share access to ansible, but /not/ have free reign on
the entire database. For instance, it may be possible to give
someone access to manage what their SSH keys can manage, but *NOT*
give them access to edit inventory. Since the inventory plugin must
be run by those users, they must at least have read access on the
database but not write access. Many users would like to control
access for users via things like LDAP, AD, etc -- and would not want
to have to enforce this via filesystem ACLs and permissions. This is
the primary reason we are doing this.

(edit/clarification: (F) assumes an inventory plugin that does not
speak REST, which is probably going to be the default for efficiency
reasons -- also
there may be other scripts/tools that want to use the acom.data
library in similar ways)

With that said, is it going to be possible for people to easily plug their own databases.

I ask only because we are rolling elasticsearch as the primary store for any data that does not require transactions.

The benefits we get from this is easy scalability (think twitter scale), real time search, extremely fast search, dynamic slice and dicing on millions of.records for real time analytics

Obviously elastic search does not have aby authentication layer so to solve any of the issues you have laid out a service would need to be developed which handled authentication and proxied requests to the elastic search.cluster

No.

Picking one specific database (anyone) is going to cause issues and push back in a lot of corporate environments. That's why I raised they issue of needing one at all.

Databases selections are as "religious" an item as things come in IT and are dictated from the top down. MySQL is the most broadly acceptted open source preference I've seen if corporate acceptance is a concern.

<tim/>

Yep, I'm damned whatever I pick -- EVERYONE has an opinion. Seeing I
am doing 80% of the work, and have a clear view of what I might want
to use in the future, I picked.

I take PostgreSQL a lot more seriously than MySQL, and MySQL is in use
at a lot of impressive places. They are both fine choices.

Cobbler picked not having a database early on for adoption reasons --
it still did ok, but it was the wrong choice in the long term because
the code wasn't optimized to use it even though the backing store was
pluggable.

Let's stop thinking about it and start getting stuff done, eh? :slight_smile:

Maybee using SQL interface compliant with the DB-API 2.0 specification
described by PEP 249¹ can make DB backoffice usable with different
backend, but I'm agree with Michael, this can not be a priority.
Actually there are some PGSQL/non SQL-standard feature in code, I
cleaned it on my repo, but integration can be posticipated.

Let's stop thinking about it and start getting stuff done, eh? :slight_smile:

YEEESSS!! :slight_smile:

Ciao
       Marco

¹ http://www.python.org/dev/peps/pep-0249/

This is a barrel, not can, of worms, there are good reasons for every DB choice, from filesystem, RDBMS, nosql, in memory bloom filters, etc…

Keep in mind that Micheal chose Postgresql after looking at his requirement list of ‘must haves’ and finding the well known alternatives lacking.

This choice works well with the intended scope of the project, I don’t think it merits discussing unless there is a glaring feature that the ansible reporting/gui needs that is not available.

That said, if you contribute code that uses the DB, keep in mind portability and try to avoid ‘lock in’ features (which EVERY DB system has), unless they are a really big asset to the app (ej: potgres GIS module for a geomap app).

In the case people still want to keep exploring this, I’ll throw another choice into the mix:
http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20Page

just my 2c, feel free to disregard my opinion.

Not supporting ORM worries me, there are a number of solutions out there that will be used within each deployment and if they all took this stance every devops group out there is in trouble.

There is so many options now every team like yourselves is saying we like this one. Another project we are working with set their sites on mongo…

This mindset of skipping ORM or plug ability is making it so now we ha e to be experts i. 10 different data base systems or we have to hire different teams to manage them?

No it just comes down to picking the product that doesnt force us to maintain yet another database platform.

I am not saying build the entire ORM now, i am just saying write it in a way that will allow others to easily create adaptors for the database that makes sense for their infrastructure.

PostgreSQL is not rocket science and I said we could revisit this later.

Not now. Not this year.

--Michael

I’m not against using an ORM, but it has to be for the right reasons, just to clear a few common misconceptions about ORMs:

  • I wouldn’t build an ORM, there are quite a few outher, some may even fit your needs. Building an ORM is not trivial, otherwise someone would have done it right years ago.

  • Using an ORM does not guarantee you won’t get locked into a DB, that has to be a separate conscious goal, which the ORM can be helpful with.

  • ORMs are good to get off the ground fast, they tend fail at large scale and normally you have to do a lot of direct access code (SQL) to bypass the ORM and get decent performance. This leads to a loss of any maintainability you might have gained. Not that I see any scale issues with ansible that would lead to this except if google and/or amazon start using it tomorrow.

  • ORM != ODM, if you go nosql you’ll have to either move to a specific interface for that solution or find a ODM (document vs relational) layer (ex: mongo).

Also, to be clear, this is only for ansible-commander, not ansible,
and if you don't like it, just don't use it.

--Michael

Is ansible commander going to provide its own api or is it simply a management interface that will talk to the ansible api.

We.might end up having to roll our own, barrowing ideas from commander along the way.

I cant.disclose too much of what we are upto, but.scale is important to us which is why we are focused so much on which technologies are right for the job

I have a lot of experience with web apps at say the 200k simultaneous users level. PG will scale for any possible use in that ballpark for what we’ll do with it.

I am not saying data abstraction won’t happen but I want to actually build the app first :).

I suspect at your level ansible pull is better OR you are not configuring constantly and building some of your own glue will happen in ways you can’t foresee until you start doing it. Until then, avoid premature optimization.

– Michael

True which is why we are making heavy use of elastic search for data that does not require transactions… From the ground up we are recording everything into elastic search, logs, errors, latency, application stats, server stats, etc which we can then easily slice n dice in real time and historically to help identify areas which need to be improved.

This will allow us to quickly identify the unforseen gotchas in real time by setting up a dashboard to.compare different metrics via time based graphs.

Some.might say why not use graphite, well it was great based on the technology at that time, but elastic search does it better.

So its not necassarily that we are over optimizing but more so making sure we are ready to identify and easily adapt to changes without getting.too many different technologies in the mix which could become a management nightmare

Ok, so...

Acom isn't going to be injecting nearly the kinds of data that
graphite would be. There's nothing saying you couldn't also monitor
on the side -- I hope you do.

Here's my suggestion. Let this thread drop off for a bit, because
there are 300+ people on this mailing list who are probably a bit
tired of it. Let Ansible commander evolve and figure out what it is
going to be, and then, after Christmas, if you find it useful still,
give me a patch to make the storage backends pluggable if you still
think you need it. I can promise you it's not going to be rocket
science to do, but, like I said, it's a total distraction to maintain
it while things are still evolving.

I honestly can't judge the reality at which your business will need to
scale, or the scaling problems you'll actually have once these things
go live in your environment -- I suspect ansible won't be the
bottleneck by a long shot. I do suspect you'll be using ansible-pull
vs push if you need to address several thousand machines near
simultaneously, though you could obviously use regular ansible to
configure that. Whether acom evolves to provide some tools around
pull mode I can't say at this time. I'd like to do these things, but
it's easy to get distracted

For now, I'm much more interested in solving the problems of people
with average levels of servers that want a management GUI -- and we
can see how things grow from there.

--Michael