Admin password not working in new build

I’m new to building AWX 21.10.2, but it has been working for me in OpenShift. Suddenly, 2 things happened. First, I noticed the admin password I’m setting in my scripts is not accepted by my new install. I can look at the Secret in OpenShift, and it’s exactly what I set, but AWX declines my access attempt. I can change the Secret, and the new secret is still declined.

I did have to run awx-manage migrate --noinput to get things to start up. The first time I run it fails with errors, but the second time it builds a large number of things and I am able to get to the login page. It could be a problem related to that whole process.

At the same time, 5 installations of AWX 20.0.1 all lost their deployments out of the clear blue. This is probably unrelated, but I cannot be sure. The Operators are still there, but the AWX deployment itself is just gone. I’m still trying to dig into why/how that happened.

If anyone has any thoughts on why the Admin Secret might not work, I’m all ears. Or what on earth I did to my OpenShift installation.

I just reinstalled 20.10.1 to no effect. The login page comes up fine, but it rejects the admin secret. When I ran awx-manage migrate, this was the output:
sh-5.1$ awx-manage migrate
Operations to perform:
Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
Applying conf.0008_subscriptions… OK
Applying conf.0009_rename_proot_settings… OK
Applying taggit.0003_taggeditem_add_unique_index… OK
Applying sessions.0001_initial… OK
Applying main.0014_v330_saved_launchtime_configs… OK
Applying main.0015_v330_blank_start_args… OK
Applying main.0016_v330_non_blank_workflow… OK
Applying main.0017_v330_move_deprecated_stdout… OK
Applying main.0018_v330_add_additional_stdout_events… OK
Applying main.0019_v330_custom_virtualenv… OK
Applying main.0020_v330_instancegroup_policies… OK
Applying main.0021_v330_declare_new_rbac_roles… OK
Applying main.0022_v330_create_new_rbac_roles… OK
Applying main.0023_v330_inventory_multicred… OK
Applying main.0024_v330_create_user_session_membership… OK
Applying main.0025_v330_add_oauth_activity_stream_registrar… OK
Applying oauth2_provider.0001_initial… OK
Applying main.0026_v330_delete_authtoken… OK
Applying main.0027_v330_emitted_events… OK
Applying main.0028_v330_add_tower_verify… OK
Applying main.0030_v330_modify_application… OK
Applying main.0031_v330_encrypt_oauth2_secret… OK
Applying main.0032_v330_polymorphic_delete… OK
Applying main.0033_v330_oauth_help_text… OK
Applying main.0034_v330_delete_user_role…2022-12-28 14:55:50,857 INFO [-] rbac_migrations Computing role roots…
2022-12-28 14:55:50,858 INFO [-] rbac_migrations Found 0 roots in 0.000184 seconds, rebuilding ancestry map
2022-12-28 14:55:50,858 INFO [-] rbac_migrations Rebuild ancestors completed in 0.000006 seconds
2022-12-28 14:55:50,858 INFO [-] rbac_migrations Done.
OK
Applying main.0035_v330_more_oauth2_help_text… OK
Applying main.0036_v330_credtype_remove_become_methods… OK
Applying main.0037_v330_remove_legacy_fact_cleanup… OK
Applying main.0038_v330_add_deleted_activitystream_actor… OK
Applying main.0039_v330_custom_venv_help_text… OK
Applying main.0040_v330_unifiedjob_controller_node… OK
Applying main.0041_v330_update_oauth_refreshtoken… OK
Applying main.0042_v330_org_member_role_deparent…2022-12-28 14:55:52,843 INFO [-] rbac_migrations Computing role roots…
2022-12-28 14:55:52,844 INFO [-] rbac_migrations Found 0 roots in 0.000129 seconds, rebuilding ancestry map
2022-12-28 14:55:52,844 INFO [-] rbac_migrations Rebuild ancestors completed in 0.000007 seconds
2022-12-28 14:55:52,845 INFO [-] rbac_migrations Done.
OK
Applying main.0043_v330_oauth2accesstoken_modified… OK
Applying main.0044_v330_add_inventory_update_inventory… OK
Applying main.0045_v330_instance_managed_by_policy… OK
Applying main.0046_v330_remove_client_credentials_grant… OK
Applying main.0047_v330_activitystream_instance… OK
Applying main.0048_v330_django_created_modified_by_model_name… OK
Applying main.0049_v330_validate_instance_capacity_adjustment… OK
Applying main.0050_v340_drop_celery_tables… OK
Applying main.0051_v340_job_slicing… OK
Applying main.0052_v340_remove_project_scm_delete_on_next_update… OK
Applying main.0053_v340_workflow_inventory… OK
Applying main.0054_v340_workflow_convergence… OK
Applying main.0055_v340_add_grafana_notification… OK
Applying main.0056_v350_custom_venv_history… OK
Applying main.0057_v350_remove_become_method_type… OK
Applying main.0058_v350_remove_limit_limit… OK
Applying main.0059_v350_remove_adhoc_limit… OK
Applying main.0060_v350_update_schedule_uniqueness_constraint… OK
Applying main.0061_v350_track_native_credentialtype_source… OK
Applying main.0062_v350_new_playbook_stats… OK
Applying main.0063_v350_org_host_limits… OK
Applying main.0064_v350_analytics_state… OK
Applying main.0065_v350_index_job_status… OK
Applying main.0066_v350_inventorysource_custom_virtualenv… OK
Applying main.0067_v350_credential_plugins… OK
Applying main.0068_v350_index_event_created… OK
Applying main.0069_v350_generate_unique_install_uuid… OK
Applying main.0070_v350_gce_instance_id… OK
Applying main.0071_v350_remove_system_tracking… OK
Applying main.0072_v350_deprecate_fields… OK
Applying main.0073_v360_create_instance_group_m2m… OK
Applying main.0074_v360_migrate_instance_group_relations… OK
Applying main.0075_v360_remove_old_instance_group_relations… OK
Applying main.0076_v360_add_new_instance_group_relations… OK
Applying main.0077_v360_add_default_orderings… OK
Applying main.0078_v360_clear_sessions_tokens_jt… OK
Applying main.0079_v360_rm_implicit_oauth2_apps… OK
Applying main.0080_v360_replace_job_origin… OK
Applying main.0081_v360_notify_on_start… OK
Applying main.0082_v360_webhook_http_method… OK
Applying main.0083_v360_job_branch_override… OK
Applying main.0084_v360_token_description… OK
Applying main.0085_v360_add_notificationtemplate_messages… OK
Applying main.0086_v360_workflow_approval… OK
Applying main.0087_v360_update_credential_injector_help_text… OK
Applying main.0088_v360_dashboard_optimizations… OK
Applying main.0089_v360_new_job_event_types… OK
Applying main.0090_v360_WFJT_prompts… OK
Applying main.0091_v360_approval_node_notifications… OK
Applying main.0092_v360_webhook_mixin… OK
Applying main.0093_v360_personal_access_tokens… OK
Applying main.0094_v360_webhook_mixin2… OK
Applying main.0095_v360_increase_instance_version_length… OK
Applying main.0096_v360_container_groups… OK
Applying main.0097_v360_workflowapproval_approved_or_denied_by… OK
Applying main.0098_v360_rename_cyberark_aim_credential_type… OK
Applying main.0099_v361_license_cleanup… OK
Applying main.0100_v370_projectupdate_job_tags… OK
Applying main.0101_v370_generate_new_uuids_for_iso_nodes… OK
Applying main.0102_v370_unifiedjob_canceled… OK
Applying main.0103_v370_remove_computed_fields… OK
Applying main.0104_v370_cleanup_old_scan_jts… OK
Applying main.0105_v370_remove_jobevent_parent_and_hosts… OK
Applying main.0106_v370_remove_inventory_groups_with_active_failures… OK
Applying main.0107_v370_workflow_convergence_api_toggle… OK
Applying main.0108_v370_unifiedjob_dependencies_processed… OK
Applying main.0109_v370_job_template_organization_field…2022-12-28 14:56:24,748 INFO [-] rbac_migrations Unified organization migration completed in 0.0250 seconds
2022-12-28 14:56:24,777 INFO [-] rbac_migrations Unified organization migration completed in 0.0289 seconds
2022-12-28 14:56:26,216 INFO [-] rbac_migrations Rebuild parentage completed in 0.003442 seconds
OK
Applying main.0110_v370_instance_ip_address… OK
Applying main.0111_v370_delete_channelgroup… OK
Applying main.0112_v370_workflow_node_identifier… OK
Applying main.0113_v370_event_bigint… OK
Applying main.0114_v370_remove_deprecated_manual_inventory_sources… OK
Applying main.0115_v370_schedule_set_null… OK
Applying main.0116_v400_remove_hipchat_notifications… OK
Applying main.0117_v400_remove_cloudforms_inventory… OK
Applying main.0118_add_remote_archive_scm_type… OK
Applying main.0119_inventory_plugins… OK
Applying main.0120_galaxy_credentials… OK
Applying main.0121_delete_toweranalyticsstate… OK
Applying main.0122_really_remove_cloudforms_inventory… OK
Applying main.0123_drop_hg_support… OK
Applying main.0124_execution_environments… OK
Applying main.0125_more_ee_modeling_changes… OK
Applying main.0126_executionenvironment_container_options… OK
Applying main.0127_reset_pod_spec_override… OK
Applying main.0128_organiaztion_read_roles_ee_admin… OK
Applying main.0129_unifiedjob_installed_collections… OK
Applying main.0130_ee_polymorphic_set_null… OK
Applying main.0131_undo_org_polymorphic_ee… OK
Applying main.0132_instancegroup_is_container_group… OK
Applying main.0133_centrify_vault_credtype… OK
Applying main.0134_unifiedjob_ansible_version… OK
Applying main.0135_schedule_sort_fallback_to_id… OK
Applying main.0136_scm_track_submodules… OK
Applying main.0137_custom_inventory_scripts_removal_data… OK
Applying main.0138_custom_inventory_scripts_removal… OK
Applying main.0139_isolated_removal… OK
Applying main.0140_rename… OK
Applying main.0141_remove_isolated_instances… OK
Applying main.0142_update_ee_image_field_description… OK
Applying main.0143_hostmetric… OK
Applying main.0144_event_partitions… OK
Applying main.0145_deregister_managed_ee_objs… OK
Applying main.0146_add_insights_inventory… OK
Applying main.0147_validate_ee_image_field… OK
Applying main.0148_unifiedjob_receptor_unit_id… OK
Applying main.0149_remove_inventory_insights_credential… OK
Applying main.0150_rename_inv_sources_inv_updates… OK
Applying main.0151_rename_managed_by_tower… OK
Applying main.0152_instance_node_type… OK
Applying main.0153_instance_last_seen… OK
Applying main.0154_set_default_uuid… OK
Applying main.0155_improved_health_check… OK
Applying main.0156_capture_mesh_topology… OK
Applying main.0157_inventory_labels… OK
Applying main.0158_make_instance_cpu_decimal… OK
Applying main.0159_deprecate_inventory_source_UoPU_field… OK
Applying main.0160_alter_schedule_rrule… OK
Applying main.0161_unifiedjob_host_status_counts… OK
Applying main.0162_alter_unifiedjob_dependent_jobs… OK
Applying main.0163_convert_job_tags_to_textfield… OK
Applying main.0164_remove_inventorysource_update_on_project_update… OK
Applying main.0165_task_manager_refactor… OK
Applying main.0166_alter_jobevent_host… OK
Applying main.0167_project_signature_validation_credential… OK
Applying main.0168_inventoryupdate_scm_revision… OK
Applying main.0169_jt_prompt_everything_on_launch… OK
Applying main.0170_node_and_link_state… OK
Applying main.0171_add_health_check_started… OK
Applying main.0172_prevent_instance_fallback… OK
Applying main.0173_instancegroup_max_limits… OK
Applying oauth2_provider.0002_auto_20190406_1805… OK
Applying oauth2_provider.0003_auto_20201211_1314… OK
Applying sites.0001_initial… OK
Applying sites.0002_alter_domain_unique… OK
Applying social_django.0001_initial… OK
Applying social_django.0002_add_related_name… OK
Applying social_django.0003_alter_email_max_length… OK
Applying social_django.0004_auto_20160423_0400… OK
Applying social_django.0005_auto_20160727_2333… OK
Applying social_django.0006_partial… OK
Applying social_django.0007_code_timestamp… OK
Applying social_django.0008_partial_timestamp… OK
Applying social_django.0009_auto_20191118_0520… OK
Applying social_django.0010_uid_db_index… OK
Applying sso.0001_initial… OK
Applying sso.0002_expand_provider_options… OK
Applying sso.0003_convert_saml_string_to_list… OK
Applying taggit.0004_alter_taggeditem_content_type_alter_taggeditem_tag… OK
Applying taggit.0005_auto_20220424_2025… OK
sh-5.1$

OK. Here is my problem;
2022-12-28 14:59:26,440 WARNING [-] awx.main.wsbroadcast Connection from it-network-engineering-awx-7f7bc49ccc-qpdqc to 10.58.20.28 failed: ‘Cannot connect to host 10.58.20.28:8052 ssl:False [Connect call failed (‘10.58.20.28’, 8052)]’.

Now to try to guess why that’s failing when it used to work.

If I look at the YAML of all 3 Pods in my AWX installs, I don’t see that IP listed at all. The Operator pod, the AWX pod, and the Postgres pods all have different IPs. I would expect, ignorantly, for this connection to be from the AWX pod back to the AWX pod at 10.58.9.21. And if it were trying to connect to the Operator, why would it not connect to the correct IP, 10.58.51.244. Why is it connecting (failing) to a non-existent IP? Could it be trying to connect to an old ghost IP from a previous installation in the same namespace? Could 10.58.20.28 be a holdover from something else?

That IP is on a different worker node. Could I have something misconfigured that causes it to look at the wrong worker?

OK. Getting closer, maybe. I built a new reference project, and it worked. I’m able to log in, so the code seems to be solid. The problem seems to be with this namespace/project.

A couple days ago, I tried running the backup project, kind: AWXBackup. It worked. I ran it again with a new name. I tried a restore, and that worked. I added a project and credential to the AWX instance and ran the restore again, and again it said nothing had changed. This was a mystery to me, since I had changed something. It should have changed it back. That is still a mystery to me. So I decided to delete the project and restore from backup … like the ignoramus I am.

The project stayed hung in Terminating. Some nosing around taught me the problem was dependencies. I’d created 2 backups, and the project would not delete until those backups were gone. I tried to oc delete the projects, but that would just hang. So, that left me in a quandry. I did the process where you create a backup of the existing configuration, remove the dependencies, then overwrite the existing project with the project having no dependencies. That caused the project to be able to delete itself.

Odds are I still have the two backups sitting out there somewhere, munging my new install. I will begin looking.

OK. Carefully deleting the PVC did cure the random IP problem. It did not, however, cure the login problem. I’ve copied the awx-admin-secret and used it with the ID “admin”, I’ve changed the secret and used that, and I’ve changed it back. After each change, I get this:
10.58.22.1 - - [28/Dec/2022:17:00:35 +0000] “GET /api/login/ HTTP/1.1” 200 6240 “https://it-network-engineering-awx.apps.os-dev-nadc.mycompany.com/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54” “10.96.107.239”
[pid: 37|app: 0|req: 6/15] 10.58.22.1 () {80 vars in 2284 bytes} [Wed Dec 28 17:00:35 2022] GET /api/login/ => generated 5714 bytes in 39 msecs (HTTP/1.1 200) 10 headers in 492 bytes (1 switches on core 0)
2022-12-28 17:00:35,940 WARNING [0001a06cbf444c49804afa67b2ee5fbe] awx.api.generics Login failed for user admin from 10.58.22.1
2022-12-28 17:00:35,946 WARNING [0001a06cbf444c49804afa67b2ee5fbe] django.request Unauthorized: /api/login/
2022-12-28 17:00:35,946 WARNING [0001a06cbf444c49804afa67b2ee5fbe] django.request Unauthorized: /api/login/
[pid: 35|app: 0|req: 6/16] 10.58.22.1 () {86 vars in 2474 bytes} [Wed Dec 28 17:00:35 2022] POST /api/login/ => generated 5920 bytes in 221 msecs (HTTP/1.1 401) 10 headers in 502 bytes (1 switches on core 0)
10.58.22.1 - - [28/Dec/2022:17:00:35 +0000] “POST /api/login/ HTTP/1.1” 401 6445 “https://it-network-engineering-awx.apps.os-dev-nadc.mycompany.com/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54” “10.96.107.239”

Again, the fresh install I did works just fine. The admin password works. So, it seems this is some sort of artifact of my failed and deleted project of the same name from 2 days ago. What might I have created then that is persisting across 3 deletions?

Created a brand new name and namespace, and everything worked perfectly except the login is denied with the same message as above. It’s not the old namespace. I’m lost again. I’ll do some more searching, then probably post at the top level with this specific question.

OK. So, I created a superuser and set its password, and now I’m fully able to log in as that user. The system is up and running. What might I have done to cause the script not to create the user “admin”?