Sudo: /etc/sudo.conf is owned by uid 65534, should be 0\nsudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

Hi everyone,

I have the latest Ansible Tower installed on a RHEL 8.9

When I run specific ansible tasks, I always get this error message:

{
    "module_stdout": "",
    "module_stderr": "sudo: /etc/sudo.conf is owned by uid 65534, should be 0\nsudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1,
    "_ansible_no_log": false,
    "changed": false
}

Example of tasks would be:

    - name: Install cx_Oracle Python module
      pip:
        name: cx_Oracle
        state: present
    - name: Execute SQL to get Schemas from DB
      ibre5041.ansible_oracle_modules.oracle_sql:
        username: "{{ db_username }}"
        password: "{{ db_password }}"
        mode: "{{ 'sysdba' if ar_orcl_db_sql_user|upper == 'SYS' else 'normal' }}"
        hostname: "{{ db_host }}"
        service_name: "{{ db_service_name }}"
        port: "{{ db_port }}"
        #sql: "SELECT username FROM dba_users"
        sql: "SELECT NAME FROM v$database"
      delegate_to: 127.0.0.1
      register: schema_result

Note that, when I comment either one of the tasks, it always triggers the error.

I’ve confirmed that both file’s owner is root, and I believe the setuid has been set.

Here’s the proof:

$ ls -la /usr/bin/sudo
---s--x--x. 1 root root 165528 Dec  7  2021 /usr/bin/sudo

$ stat /usr/bin/sudo
  File: /usr/bin/sudo
  Size: 165528    	Blocks: 328        IO Block: 4096   regular file
Device: ca02h/51714d	Inode: 9184962     Links: 1
Access: (4111/---s--x--x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:sudo_exec_t:s0
Access: 2024-02-23 12:17:03.876558806 +0000
Modify: 2021-12-07 12:02:43.000000000 +0000
Change: 2022-11-21 19:52:23.879128352 +0000
 Birth: 2022-11-21 19:47:26.179931860 +0000

$ stat /etc/sudo.conf
  File: /etc/sudo.conf
  Size: 1786      	Blocks: 8          IO Block: 4096   regular file
Device: ca02h/51714d	Inode: 8619263     Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:etc_t:s0
Access: 2024-02-23 12:17:03.883558691 +0000
Modify: 2021-12-07 11:57:12.000000000 +0000
Change: 2022-11-21 19:52:23.873128307 +0000
 Birth: 2022-11-21 19:47:26.165931757 +0000

What did I miss here?

Hello!
In the error message when you show /etc/sudo.conf the UID is 655534 when in fact it should be 0, root.
Try to give root permissions to that path:
sudo chown root:root /etc/sudo.conf

Try with that and in case it is not working give again the bitsuid to the /sudo
with sudo chmod u+s /usr/bin/sudo

@valkiriaaquatica I just ran those commands, but the issue persists.

I think it really doesn’t make sense. If you check the output of my stat command, the issue shouldn’t happen. I must’ve missed something here.

Does ansible tower generate a command that I can run directly from the terminal? Or probably, are there any log files I can look for?

Hmm, AWX stores the logs in /var/log/tower and
/var/log/supervisor/ …

1 Like

Hi, does the error occur when you launch your playbook through Ansible Automation Controller (or AWX), right?

If so, appending delegate_to: 127.0.0.1 means delegating the task to EE container itself.
I don’t know which image is used as EE on your environment, but I guess the owner of /etc/sudo.conf in the container image is the root cause.

Does adding become: false to a task change anything?

    - name: Execute SQL to get Schemas from DB
      ibre5041.ansible_oracle_modules.oracle_sql:
        username: "{{ db_username }}"
        password: "{{ db_password }}"
        mode: "{{ 'sysdba' if ar_orcl_db_sql_user|upper == 'SYS' else 'normal' }}"
        hostname: "{{ db_host }}"
        service_name: "{{ db_service_name }}"
        port: "{{ db_port }}"
        #sql: "SELECT username FROM dba_users"
        sql: "SELECT NAME FROM v$database"
      delegate_to: 127.0.0.1
      become: false   ✅
      register: schema_result
1 Like

@kurokobo your explaination makes sense.

But, I’m not sure either. I downloaded the Ansible Tower from here:
https://releases.ansible.com/ansible-tower/setup/ansible-tower-setup-3.8.6-2.tar.gz

Then, I modified 2 password variables in the inventory, and run ./setup.sh.

I haven’t checked the existence of docker containers, and I cannot check it now, since I’ve decommissioned the server and planned to rebuild everything.

If what you said is true, then it really makes sense. I’ll let you know later.

Ah are you on exact Ansible Tower, instead of Automation Controller?

Using Tower 3.8 is okay if this is your intended design, but Ansible Tower is already renewed as Automation Controller and the latest version is 4.5: Automation Controller Release Notes v4.5 — Automation Controller Release Notes v4.5

There are lot of differrences from Tower and Automation Controller around its architecture. So my comment may be invalid since I assumed you’re on Automation Controller.

Note:
Tower 3.8 in AAP 1.2 is already in extended support phase and is reaching ELS in this year: Red Hat Ansible Automation Platform Life Cycle - Red Hat Customer Portal.
Also standalone Tower is already EOL: Red Hat Ansible Tower Life Cycle - RETIRED - Red Hat Customer Portal

1 Like

The company moved to ansible automation platform but stil uses tower as well.

@kurokobo so since standalone Tower is already EOL, it means it doesn’t receive any updates or patches, right? Probably, the one I’m experiencing now is a bug.

I’ve already rebuilt everything 3-4 times, just wanted to confirm that I didn’t miss a single installation step.

It looks like I need to switch to Automation Controller if I want to continue.

PS: I’ve deployed AWX v17 to a CentOS 7, and it works. I think this is enough for now as I just want to prove my playbook should work correctly.

By the way, I really appreciate everyone’s support here. Now, I have a better understanding on what is going on.

@budiantoip

Correct.

I can’t say for sure since I haven’t examined your actual environment, but I still believe that your problem is not a bug.

The old Tower 3.8 also has a mechanism to isolate the job execution process in another form (Bubblewrap) that is not Podman: 18. Bubblewrap functionality and variables — Ansible Tower Administration Guide v3.8.6

As Bubblewrap has the ability to map owners of the files in the sandbox to nobody (65534) if its owner is not current user. This is probably a cause of your issue.

The key is, to avoid setting become to true on tasks for localhost unless you really need it. Whether it is Tower, AWX, or Automation Controller, privileged access on localhost will most likely fail in an environment where process isolation is enabled, and there should be very few situations where it is needed in the first place.

1 Like

Quick demo for Bubblewrap:

# Logged in as UID: 1000
[awx@exec01 ~]$ id
uid=1001(awx) gid=1001(awx) groups=1001(awx) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

# Ensure the owner of directories under / is root:root
[awx@exec01 ~]$ ls -l /
total 24
lrwxrwxrwx.   1 root root    7 Jun 22  2021 bin -> usr/bin
...
drwxrwxrwt.  20 root root 4096 Feb 26 00:41 tmp
drwxr-xr-x.  13 root root  158 Jan  3  2023 usr
drwxr-xr-x.  21 root root 4096 Apr 28  2023 var

# Also the owner of /etc/sudo.conf is root as well
[awx@exec01 ~]$ ls -l /etc/sudo.conf 
-rw-r-----. 1 root root 1786 Dec 12  2021 /etc/sudo.conf

# Launch "sandboxed" bash with Bubblewrap
[awx@exec01 ~]$ bwrap --dev-bind / / bash

# The owners are nobody
[awx@exec01 ~]$ ls -l /
total 24
lrwxrwxrwx.   1 nobody nobody    7 Jun 22  2021 bin -> usr/bin
...
drwxrwxrwt.  20 nobody nobody 4096 Feb 26 00:41 tmp
drwxr-xr-x.  13 nobody nobody  158 Jan  3  2023 usr
drwxr-xr-x.  21 nobody nobody 4096 Apr 28  2023 var

# The owner of /etc/sudo.conf is also nobody
[awx@exec01 ~]$ ls -l /etc/sudo.conf 
-rw-r-----. 1 nobody nobody 1786 Dec 12  2021 /etc/sudo.conf

# nobody is UID: 65534
[awx@exec01 ~]$ ls -ln /etc/sudo.conf 
-rw-r-----. 1 65534 65534 1786 Dec 12  2021 /etc/sudo.conf

# The "sudo" is not working anymore. This is the error you've faced
[awx@exec01 ~]$ sudo echo hoge
sudo: /etc/sudo.conf is owned by uid 65534, should be 0
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set
2 Likes

@kurokobo Thank you for your explanation. I’ve tried those commands and they returned the same output.

I’ve added become: false to the second task, and it now works as expected.

As for the first task, it returned this error message:

{
    "cmd": [
        "/var/lib/awx/venv/ansible/bin/pip3",
        "install",
        "cx_Oracle"
    ],
    "msg": "stdout: Collecting cx_Oracle\n  Downloading https://files.pythonhosted.org/packages/ec/28/84cc23a2d5ada575d459a8d260286d99dde4b00cafcc34ced7877b3c9bf0/cx_Oracle-8.3.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (858kB)\nInstalling collected packages: cx-Oracle\n\n:stderr: ERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: '/var/lib/awx/venv/ansible/lib64/python3.6/site-packages/cx_Oracle.cpython-36m-x86_64-linux-gnu.so'\n\nWARNING: You are using pip version 19.3.1; however, version 21.3.1 is available.\nYou should consider upgrading via the 'pip install --upgrade pip' command.\n",
    "invocation": {
        "module_args": {
            "name": [
                "cx_Oracle"
            ],
            "state": "present",
            "virtualenv_site_packages": false,
            "virtualenv_command": "virtualenv",
            "editable": false,
            "version": null,
            "requirements": null,
            "virtualenv": null,
            "virtualenv_python": null,
            "extra_args": null,
            "chdir": null,
            "executable": null,
            "umask": null
        }
    },
    "_ansible_no_log": false,
    "changed": false
}

If you notice it, the task runs this command:

/var/lib/awx/venv/ansible/bin/pip3 install cx_Oracle

I decided to run it inside bubblewrap, and got the same error message. But, based on your explanation I was thinking about adding the --user flag, and then ran it:

/var/lib/awx/venv/ansible/bin/pip3 install cx_Oracle --user

The error has finally gone.

So I decided to add extra_args: "--user" to the task. The task now runs correctly.

I’d like to thank you for your help. It really helped me a lot.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.