Greetings,
I’ve been looking at the crypto that underlies ansible-vault, and I’m worried. Specifically, it seems to me that the vault as implemented is not safe for credential storage.
If you haven’t read the implementation, this is essentially what happens when you encrypt a file with ansible-vault:
- The plaintext is prepended with a SHA256 hash of itself,
- A random salt is generated,
- The vault password and salt are used to derive an AES key and IV, using a python implementation of openssl’s EVP_BytesToKey(),
- The plaintext-with-hash is encrypted with AES in CBC mode, using the above key and IV,
- This ciphertext is hexlified and stored.
Looking at the comments in the code, this is a python reproduction of what openssl aes-256-cbc -salt
does, except for the SHA256 hash bit which is an “aftermarket” integrity measure.
The biggest concern I have about this code is that these blobs are much easier to bruteforce than the “AES-256” would lead you to believe. This is because openssl’s EVP_BytesToKey() is a poor KDF, one which does not make the key derivation expensive in CPU and/or memory. As a result, it is very cheap to test a candidate password: four MD5 operations on small inputs, a few AES-256 block operations, and a SHA-256 hash. All these operations are easy to hardware-accelerate, either with modern CPUs (AES-NI) or GPUs.
This dramatically reduces the search space from 32 bytes if brute-forcing the AES key (64 bytes if the IV isn’t included in the ciphertext, which I haven’t checked), to the number of bytes in the password. Those bytes are also very likely to be in a small set of values (letters+numbers, maybe symbols if you’re lucky), further reducing the search space for bruteforcing.
A good KDF would close this avenue of attack by making the key derivation so expensive that it’s cheaper to bruteforce the AES key. Unfortunately, EVP_BytesToKey() is not such a KDF. Its documentation in OpenSSL even recommends using better KDFs, such as PBKDF2 or scrypt, for designs which don’t specifically require BytesToKey.
So, it seems to me that ansible-vault blobs are not safe to expose to untrusted people, because brute-forcing them in an offline attack is much easier than it would seem. This is a problem, because if only trusted people have access to the blobs, you might as well just have the sensitive data in cleartext.
Further concerns about the implementation:
- SHA-256 is used as an authentication code, but isn’t one.
- The encoding is constructed as “mac-then-encrypt”, whereas encrypt-then-mac is the safer default, because it minimizes your code’s exposure to hostile inputs. This is relatively minor compared to the effective lack of MAC.
- The hash check on decryption is not constant-time, which opens up a timing side-channel.
- The core of the implementation seems lifted verbatim from a pair of Stack Overflow answers. This is concerning in two ways:
- The question being answered was “how do I reproduce this one specific behavior of the openssl CLI in python?”, not “What’s a good way to securely store sensitive data at rest, where attackers can perform offline attacks at will?”
- The unit tests only verify that the implementation is internally consistent (M == decrypt(encrypt(M)) essentially), not that it matches the openssl behavior it’s copying. While the primitives are delegated to pycrypto, there could be bugs lurking in the glue around the primitives. Since this is a non-standard combination of primitives, there are no canonical test inputs you can check against.
I should say that I’m not a crypto expert, merely an enthusiastic amateur. However, my spidey sense is tingling pretty hard in light of all the above. I’m kinda hoping that I’ve overlooked something obvious that makes this all safe, but that’s a lot of distinct concerns to address :/.
Not wishing to be just a downer, I have suggestions for safer vault implementations:
-
Derive keys with PBKDF#2, use NaCl’s secretbox() for encryption and decryption. Secretbox implements correct and fast authenticated encryption, and PBKDF#2 will severely slow down trivial bruteforcing attacks. Pynacl provides Python bindings for NaCl, and pycrypto provides PBKDF#2.
-
If a dependency on pynacl is not desired, use AES-GCM to perform authenticated encryption. AES-GCM will be available in the upcoming pycrypto release.
-
For something that uses only current pycrypto, AES-CTR combined with an HMAC-SHA256 authentication code. However, this is starting to drift back into the territory of manually gluing primitives together in new and exciting ways (although AES-CTR+HMAC-SHA256 is not exactly off the beaten path), which increases the risk.
I’d be more than happy to provide a vault implementation for the first option, and the commandline plumbing to enable selection of vault implementations, if it would be helpful. I wouldn’t trust myself to implement the other two without oversight from an expert, unfortunately. -
Dave