Lost secret key

We’ve had an unfortunate event where our AWX environment was moved from one EKS cluster to another and the secret key was stored in the (old) EKS cluster and not in an external secrets manager.

To make this worse, the team managing the move didn’t realise this and assumed everything was OK and then deleted the old EKS cluster, obliterating the secret key. They’re trying to restore this but I’m anticipating further complications.

When the awx-web pod tries to start, we see this:

2025-08-15 16:09:38,709 ERROR    [-] awx.main.utils.encryption Failed to decrypt `Setting(pk=23).value`; if you've recently restored from a database backup or are running in a clustered environment, check that your `SECRET_KEY` value is correct
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/cryptography/fernet.py", line 134, in _verify_signature
    h.verify(data[-32:])
cryptography.exceptions.InvalidSignature: Signature did not match digest.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/utils/encryption.py", line 159, in decrypt_field
    return smart_str(decrypt_value(key, value))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^

Full disclosure: I’m not a Kubernetes person, just an operator of AWX so sorry if my understanding and terminology are somewhat off.

I’m assuming the secret key is used to encrypt things like passwords in vaults and possibly any internal users of AWX.

Is it possible to start AWX and ignore this problem? I get that all the passwords in vaults would be trash but typing those in so they’re (presumably) re-encrypted wouldn’t be too much of an onerous task.

AWX operator v12.19.1
AWX v24.6.1

There is a management command. Eyeballing the implementation, it’ explicitly does the decryption and encryption in that script. Maybe you can hack up the decrypt call to allow for errors or avoid calling decrypt in favor or just calling encrypt. This might get you past your error.

1 Like

Thanks @chrismeyersfsu. As you say, I can probably hack that up and likely comment out the decrypt lines so the encrypted values become double encrypted but at least allow AWX to start and I could try to fix things up from there.

The team managing EKS are trying to restore etcd from a backup of the old cluster too.

Fortunately this wasn’t a large deployment and with a dump of the database it’s likely I can recreate everything within a few hours. I did read a post here that someone had the same issue and they were able to scrape the database, convert to JSON and then feed that back in to AWX.

It’s not the end of the world but I have learnt a few things along the way, such as the secret key being stored in EKS which I’ll move to a secret manager and read up about the backup role in the helm chart.