How to use option for termination grace period with Helm value file

Hello,

According the documentation (https://github.com/ansible/awx-operator/blob/devel/docs/user-guide/advanced-configuration/pods-termination-grace-period.md) and announced in the AWX-Operator version 1.3.0. We are trying to deploy this new feature “termination_grace_period_seconds” via Helm value file but receiving error. What is wrong in our Helm value file and how we can use this feature via Helm values? Thanks in advance.

Error: UPGRADE FAILED: error validating “”: error validating data: ValidationError(AWX.spec): unknown field “termination_grace_period_seconds” in com.ansible.awx.v1beta1.AWX.spec

NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
awx-dev-awx-operator    awx             46              2023-09-28 09:13:25.71129122 +0000 UTC  deployed        awx-operator-2.5.3      2.5.3

values.yaml

AWX: 
  enabled: true
  name: awx-dev
  spec:
    termination_grace_period_seconds: 60
    ee_images:
    - name: custom-awx-ee
      image: xxxxx
    ee_pull_credentials_secret: awx-dev-ee-pull-credentials
    replicas: 2
    admin_user: admin
    admin_password_secret: custom-admin-password
    ingress_type: ingress
    ingress_class_name: awx-dev
    hostname: xxxxx
    ingress_tls_secret: awx-dev-tls
    secret_key_secret: custom-awx-secret-key
  postgres:
    enabled: true
    host: xxxxx
    port: 5432
    dbName: awx
    username: awx
    sslmode: prefer

Hi,

I don’t have any explanation, but what if you uninstall your release, then deploy from scratch ?

Hi,

Probably you need to update operator before add new fields.

1 Like

I can try but usage of this option into value file is correct and it is possible to use via Helm templates, right?

Current version of AWX-Operator is 2.5.3 and I’ve tried to upgrade with the new field to the latest version 2.7.1 without success and the same error.

I mean,
You need to upgrade operator without termination_grace_period_seconds spec, then, after operator upgraded, reapply with termination_grace_period_seconds Field.

This is something which we did in the past because we’ve already updated the operator multiple times in the past.

Unfortunately, uninstall/install the operator via helm doesn’t helped and the error is the same. On new test cluster it is possible to install Operator with “termination_grace_period_seconds” value. We are using latest helm version. Any suggestions?

Debug log:

Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(AWX.spec): unknown field "termination_grace_period_seconds" in com.ansible.awx.v1beta1.AWX.spec
helm.go:84: [debug] error validating "": error validating data: ValidationError(AWX.spec): unknown field "termination_grace_period_seconds" in com.ansible.awx.v1beta1.AWX.spec
helm.sh/helm/v3/pkg/kube.scrubValidationError
        helm.sh/helm/v3/pkg/kube/client.go:815
helm.sh/helm/v3/pkg/kube.(*Client).Build
        helm.sh/helm/v3/pkg/kube/client.go:358
helm.sh/helm/v3/pkg/action.validateManifest
        helm.sh/helm/v3/pkg/action/upgrade.go:563
helm.sh/helm/v3/pkg/action.(*Upgrade).prepareUpgrade
        helm.sh/helm/v3/pkg/action/upgrade.go:290
helm.sh/helm/v3/pkg/action.(*Upgrade).RunWithContext
        helm.sh/helm/v3/pkg/action/upgrade.go:154
main.newUpgradeCmd.func2
        helm.sh/helm/v3/cmd/helm/upgrade.go:227
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.7.0/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.7.0/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.7.0/command.go:992
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598
UPGRADE FAILED
main.newUpgradeCmd.func2
        helm.sh/helm/v3/cmd/helm/upgrade.go:229
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.7.0/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.7.0/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.7.0/command.go:992
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598

it seems like the CRD haven’t been updated with the newer schema that contains termination_grace_period_seconds

You can verify this by doing

kubectl get crd awxs.awx.ansible.com -o yaml | grep termination_grace_period_seconds
2 Likes

You are totally right and that was the missing part. Deleting the resource and re-installing of the Operator did the trick. Thank you very much for your help.

3 Likes

I spent some time looking at the termination grace period bits that were recently added, but I’m unable to find if it will provide the same grace period to workflow jobs.

What I mean is, if you define a grace period (it’s essentially a guess for the time a given pod) and if the pods safely terminate on a given node, does that mean the workflow itself (stored in the DB) will make sure that if a given node goes down for maintenance after the grace period, would the workflow survive if child workflow node jobs are running on other nodes?

I haven’t had a chance to specifically test this as it would require a lot of scaffolding to ensure I have visibility into what is terminated and when (if expected or not).

It’s been something I’ve been digging through the past couple weeks and everything seems to relate to workflow jobs for me. :smile:

Sorry for the noise.

1 Like