Awx-operator and Karpenter in EKS


Failed to schedule pod, incompatible with provisioner "default", did not tolerate /karpenter=:NoSchedule; incompatible with provisioner "default-spot", did not tolerate /karpenter=:NoSchedule 

Does awx-operator work with Karpenter or any plans to?

Hello @trippinnik,

To my knowledge we haven’t tested awx-operator with Karpenter. If you try this out, please let us know how it goes and share back any pain points you run in to. You’ll probably want to be mindful of what you set node_selector, task_affinity, and web_affinity to. The defaults will not discriminate which node the pods are scheduled on.

The error is above when trying to schedule a pod, so I guess not. I don’t care what nodes things run on but it it would be ideal if Karpenter knew to start a node when the current nodes are full/used.

The jobs are generally bursts of activity and then quiet

Hi!

I’m running AWX with Karpenter. Do you need any help about this or were you able to resolve it?

I just saw your message now, but it’s been a while.

@alvaroc20 ,
How did you get this working?
We use helm charts to set this up and want to run the awx-task pods on a seperate karpenter nodepool, but our pods don’t get scheduled on the karpenter nodepool.

Hi @deanso

In my case, I use a single NodePool for both task and web. If you could share the specific error you’re encountering, I might be able to assist further.

First and foremost, make sure that your Karpenter setup is fully operational. This includes verifying that Karpenter is creating and scaling nodes as expected. You can follow the official documentation to run some tests and confirm everything is working correctly. Once that’s in place, you’ll need to configure your EC2NodeClass and NodePool resources for Karpenter (these are typically managed as Kubernetes resources rather than directly through Helm). If you’ve got this ready and functional, you’re in a good position to proceed further!

That said, based on what you mentioned, I believe you’d need two NodePools: one configured for task pods with a specific label referencing it, and another for web pods. In the AWX resource (CRD), you would need to make use of task_node_selector and web_node_selector to ensure the pods are scheduled where they need to be. Alternatively, you could consider using nodeAffinity or taints/tolerations, although the simplest approach might be nodeSelector.

Be cautious about two key aspects:

  1. Architecture: Ensure the image is for an amd64 architecture.
  2. Resource Requests: Pay close attention to the resource requests defined in your CRD for the pods, keeping in mind the instance types you’re using. I’ve noticed memory consumption is quite high, particularly for web pods. Here’s my configuration for reference:
web_resource_requirements:
  limits:
    cpu: "2"
    memory: 2Gi
  requests:
    cpu: 100m
    memory: 1.5Gi
task_resource_requirements:
  limits:
    cpu: "2"
    memory: 2Gi
  requests:
    cpu: 100m
    memory: 1Gi

I’m using R-series instances in AWS for this setup.

If you can provide more information about the issue or where you’re currently stuck, I’d be happy to help further!

@alvaroc20 ,

Thank you for your detailed response, really appreciated!
I am currently not receiving any errors, but my AWX task and web pods are not being scheduled on the Karpenter node pool, but rather on the default Kubernetes nodes.
So I think you are right, in that my Karpenter nodepool is unable to create nodes. I will look further into that, with the Karpenter docs.
Thx!

1 Like

I have it working with below config:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: awx-instances
spec:
  disruption:
    consolidateAfter: 5m0s
    consolidationPolicy: WhenEmpty
    expireAfter: 168h0m0s
  limits:
    cpu: "16"
    memory: 2Ti
  template:
    metadata:
      labels:
        <unique_name>/karpenter-profile: awx
    spec:
      startupTaints:
        - key: ebs.csi.aws.com/agent-not-ready
          effect: NoExecute
      nodeClassRef:
        name: default
      requirements:
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - r6a.xlarge
        - c6a.xlarge
        - c7a.xlarge
        - c6a.2xlarge
        - c5a.2xlarge
        - c6i.2xlarge
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
      taints:
      - effect: NoSchedule
        key: <unique_name>/karpenter

under spec:

  task_tolerations: |
    - effect: NoSchedule
      key: <unique_name>/karpenter
  task_node_selector: |
    <unique_name>/karpenter-profile: awx

your “unique_name” between karpenter and the task_tolerations and task_node_selector should match the karpenter profile.

2 Likes

Hi @deanso

Did you manage to get it working? Do you need help with anything? If you need help with Karpenter or AWX feel free to let me know.