AWX-operator RKE2 - Backup problem

Problem with awx-operator and RKE2

  • VM with Almalinux 9
  • RKE2 Single Node Cluster
  • Installed awx-operator with those manifests
    Pastebin.com - Locked Paste
    – password: 3yXV2WBMen

My problem is that I can’t do the backup.
Ansible role constantly fails.

[awx_user@awx ~]$ kubectl describe pod awxbackup-2021-06-06-db-management -n awx
Name:                      awxbackup-2021-06-06-db-management
Namespace:                 awx
Priority:                  0
Service Account:           default
Node:                      awx.MYSUPERDOMAIN.com/10.10.36.61
Start Time:                Fri, 16 Feb 2024 23:55:54 +0100
Labels:                    app.kubernetes.io/component=awx
                           app.kubernetes.io/managed-by=awx-operator
                           app.kubernetes.io/operator-version=2.11.0
                           app.kubernetes.io/part-of=awxbackup-2021-06-06
Annotations:               <none>
Status:                    Terminating (lasts 30m)
Termination Grace Period:  30s
IP:                        
IPs:                       <none>
Containers:
  awxbackup-2021-06-06-db-management:
    Container ID:  
    Image:         postgres:13
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      sleep
      infinity
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:        25m
      memory:     32Mi
    Environment:  <none>
    Mounts:
      /backups from awxbackup-2021-06-06-backup (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rdkhf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  awxbackup-2021-06-06-backup:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  awx-backup-claim
    ReadOnly:   false
  kube-api-access-rdkhf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               32m                    default-scheduler  Successfully assigned awx/awxbackup-2021-06-06-db-management to awx.MYSUPERDOMAIN.com
  Warning  FailedCreatePodSandBox  32m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b90f35ed8db90209b8b99cc3a74d4a242b7ac1a6f56022a0ef6e9f1fa953b472": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Normal   SandboxChanged          30m (x10 over 32m)     kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedKillPod           2m28s (x128 over 30m)  kubelet            error killing pod: failed to "KillPodSandbox" for "f727b594-ba50-4677-b53c-05b5a61898cd" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"b90f35ed8db90209b8b99cc3a74d4a242b7ac1a6f56022a0ef6e9f1fa953b472\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"```

**Where is a problem? Let me know if you need more details**
**By the way, if you have any advice, I'm always happy to hear another opinion**

Hi, according to the Events you’ve provided, this is caused by Calico, not by AWX or AWX Operator.

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...":
  plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized

There are some reported issues on RKE2 repo and Calico repo, so you should refer to these issues.

I’m not a RKE2/Calico expert so I can’t provide exact solution, but upgrading RKE2 or finding and restarting Calico pod is good first step, I think.

1 Like

after many attempts to repair and debug, the legendary “have you tried turning it off and on?” worked.

but seriously, I stopped rke2-server, performed an update and reboot

btw. thank you kurokobo for your job :smiley:

2 Likes

Seriously this is sometimes the true silver bullet :smiley:

Anyway good to hear that you’ve solved the issue. Have fun with AWX!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.