Changing the S3 module behaviour

Hi Magnus,

Yeah, I think this is the best route to take and we pretty much reached consensus in the github issue ticket.

I’ve left some basic lines of code at the end of the module (commented) which do the fetch by copying the contents of the key into a file. I’m happy to get that uncommented and working with the parameters and then maybe someone wants to look at refactoring the module to make it nice and un-scripty :slight_smile: ?

Kind Regards,

Lester

Actually, we were almost there. The main issue I have is having object AND src specified, since that doesn’t make sense (the object is your source file/key). So I think it should be something like:

- name: GET object from S3
  s3: bucket=mybucket object=dir/obj.txt dest=/tmp/obj.txt state=present

- name: PUT object into S3
  s3: bucket=mybucket object=dir/obj.txt state=present

- name: Delete object from S3
  s3: bucket=mybucket object=dir/obj.txt state=absent

- name: Get URL of object for downloading
  s3: bucket=mybucket object=dir/obj.txt expiry=300

Seems my message was lost. New try:

What about Delete functionality. The current code will delete the whole bucket when using state=absent. That is not clear in the API below and a bit harsh in my opinion. After all, you only defined one object and the whole bucket goes down the drain.

Maybe something like this:

  • name: Delete bucket
    s3: bucket=mybucket state=absent

  • name: Create bucket
    s3: bucket=mynewbucket state=present

I’ll do some coding, my work will be in a branch named s3-new-api in my ansible fork on my github account (mangusart/ansible). Will do a pull request when code is ready.

/Magnus

Yes, good point. In which case, I think the object and src are the wrong parameter names to have. I’d then vote for src with some kind of naming override, which would be object_name or suchlike. In pseudo code:

if src and object_name:
   key = bucket.new_key(object_name)
   key.set_contents_from_filename(src)

The only thing I’m trying to think of is making sure the parameter choices are as obvious as possible. Since this is all on a practical note, we end up with. Object always being relative to S3 itself.

- name: GET object from S3
  s3: bucket=mybucket object=${ansible_hostname}-stuff dest=/tmp/obj.txt overwrite=yes

- name: PUT object into S3
  s3: bucket=mybucket object=${ansible_hostname}-stuff src=files/file.txt overwrite=no

- name: Delete object from S3
  s3: bucket=mybucket object=whatever state=absent

- name: Get URL of object for downloading
  s3: bucket=mybucket object=whatever expiry=300

Kind Regards,

Lester

I think this version of the API makes sense.

As I interpret it: object param is mandatory and never means anything other than the actual object on S3. Params dest and src, when present, determine which direction data will flow.

/Magnus

Hi again

OK, here we go: https://github.com/magnusart/ansible/tree/s3-new-api

That branch implements the new API discussed here. I did some quite heavy refactoring to not end up in if-else hell. Was it for the better in the end? I dunno… you be the judge, tired right now so at this point I’m blind to flaws.

Anyhow, I tried to test as thoroughly as I could. As far as I can see I have covered the whole API we outlined, including override/checksum and delete/create bucket.

The code makes sure the different scenarios only triggers on correct combinations of parameters, you’ll see what I mean if you take a look in the source. The parameters can come in many different invalid combinations but at the same time have overlapping valid combinations so I could not take advantage of required_together.

Please have a look if you have few minutes over and test it out. If it feels all right I’ll submit a pull request.

/Magnus

Tested out basic GET and PUT and it works fine. :slight_smile:

One question though, what is the preferred way to pass the access-key and secret-key variables to the module? I have a vars-file included in my playbook where I have the ec2_access_key and ec2_secret_key variables defined but they are not picked up by the module. Instead I’ve had to specify them explicitly on the module line which gets awfully long.

s3: … ec2_access_key=${ec2_access_key} ec2_secret_key=${ec2_secret_key}

It would be nice to be able to read in the variables from a vars-file.

Nice, thanks for testing.

I have not touched any of the authentication functionality afik, I did some experimenting before refactoring. That time I only got it working when I passed it in explicitly params to my playbook, just like you described. Then again maybe I affected something with my change.

The credentials code is probably reused from the ec2 and ec2_vol modules. From what I gather Boto also support other methods of authentication as well, for example, ability pick up credentials from IAM role on your EC2 instance metadata so you don’t have to pass any credentials at all. Support for that would be very nice.

Since I would bet on that there will be other modules upcoming that interface with AWS, that all will use boto to do the heavy lifting, it is probably a good idea to make consistent changes to authentication across all these modules. Maybe lwade have some thoughts? I know that I need route53 support next.

/Magnus

Thanks Magnus, I’ll look to give this a test this week too.

Regarding authentication, I tend to use local_action for what I need to do and so I pick up the credentials from my environment. If its remote, I pass them to the module as you do Karl.

With the credentials stuff we could implement any of the following:

  1. pick up creds from OS env (done)
  2. get creds from explicit declaration (done)
  3. look for a boto config file
  4. use IAM roles inside an instance

(4) would be a good option, since other cloud platforms beside AWS don’t support IAM roles. I’m not sure (3) is really of much value?

Regarding (4), with boto it automatically looks for credentials in the metadata service. So if the S3 module was being run remote action on the instance, this would be quite feasible.

Hi

Comments below.

Thanks Magnus, I’ll look to give this a test this week too.

Cool! If tests OK, tell me and I’ll produce a pull request.

Regarding (4), with boto it automatically looks for credentials in the metadata service. So if the S3 module was being run remote action on the instance, this would be quite feasible.

I also think boto picking up credentials from IAM metadata is valuable, since it is nice to manage credentials separately from playbooks when possible.

However this means that the boto connect method should be called without arguments from what I could see in the boto docs. Right now it’s not doing that so boto will never(?) pick up those credentials. It is always passing the ec2_access_key and ec2_secret_key arguments.

I’m not planning on doing more on the s3 module until it has been more tested and reviewed. I can give extended credentials a go afterwards though. I want to give you guys here at ML possibility to have a look before doing more changes.

I also would like to give a route53 module a go in the future, if not somebody beats me to it, as I need it myself.

/Magnus

Hey Magnus,

Tested and it works just fine. I would say that after looking at the module I found it hard to read (and potentially debug), I think this is because of all of the functions you are declaring. There doesn’t seem to be much need due to lack of re-use. I’m not a true python person myself but I’m not entirely convinced its “better” this way, yet.

In regards to the IAM metadata stuff, you are correct but we shouldn’t need to change too much to get this going.

Regarding other modules, definately! I’m just preparing a post to the list about AWS-related modules, I’ll get this fired out shortly.

Kind Regards,

Lester

I would also like to see less one-liner functions.

Typically in testing you will want to break things up for debugging and running all of those function calls together can complicate breakpoint or output insertion.

Hmmm… I thought I answered this before but apparently that did not get sent. Very sorry for this delayed answer. I have had some work related turbulence the last weeks and been planning an event that takes up my free time until this weekend.

As for the one liners functions. I put those there raise the level of abstraction, because of the many if-elses that I had to put in. This is because the API became/needs to be open ended. I could not get it to work with the requires functionality provided in Ansible.

I actually do prefer many small methods, especially expressions, since I believe they improve readability. I did actually go farther but backed away since the try/catch clauses forced me to pass in so many arguments. However I could refactor those into one single try/catch and create descriptive methods for the other API calls. It would then read better.

Now, this is mostly a style preference (ie functional vs imperative) and if you feel strongly about the methods I can go the other way and just replace the method calls with the if-else expressions.

Give me an update on what you wish and I’ll give that a go in the coming week.

/Magnus

Yeah you have made things a bit too functional for us to make it easy to work on.

I like functional programming in places, and occasionally play with FP languages and so on, but this should be changed to be acceptable in Ansible here and make it easy to debug.

We are not trying to make it mathematically proveable, etc.