backslashes in regex_replace filter

rjwagner.dba · January 8, 2024, 9:15pm

Hi - Does anyone (who understands how backslashes work in Ansible/YAML) know why both of the following tasks work:

(ansible2_15_8) rowagn@localhost:~#> cat d.yml

hosts: all
gather_facts: no
vars:
s: ‘This is a string containing 1 and 2.’
t:
p1_xyz
p2_xyz
p4_xyz

tasks:

name: single backslash
debug:
msg: ‘{{ item }} is in s’
loop: ‘{{ t }}’
when: ( item | regex_replace(‘^p(\d+).*$’, ‘\1’) ) in s
name: double backslash
debug:
msg: ‘{{ item }} is in s’
loop: ‘{{ t }}’
when: ( item | regex_replace(‘^p(\d+).*$’, ‘\1’) ) in s

(ansible2_15_8) rowagn@localhost:~#> ansible-playbook -i l d.yml

PLAY [all] ******************************************************************************************************************************************************

TASK [single backslash] *****************************************************************************************************************************************
ok: [localhost] => (item=p1_xyz) => {
“msg”: “p1_xyz is in s”
}
ok: [localhost] => (item=p2_xyz) => {
“msg”: “p2_xyz is in s”
}
skipping: [localhost] => (item=p4_xyz)

TASK [double backslash] *****************************************************************************************************************************************
ok: [localhost] => (item=p1_xyz) => {
“msg”: “p1_xyz is in s”
}
ok: [localhost] => (item=p2_xyz) => {
“msg”: “p2_xyz is in s”
}
skipping: [localhost] => (item=p4_xyz)

PLAY RECAP ******************************************************************************************************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

The tasks are extracting the number from the strings in list t and then looking for that number in string s. What is strange is the second example at https://docs.ansible.com/ansible/latest/collections/ansible/builtin/regex_replace_filter.html#examples indicates the backslashes in both parameters need to be doubled, but the above testing shows double backslashes are not required in the first parameter (they are required in the second parameter).

Thanks
Rob

Walter_Rowe · January 8, 2024, 9:27pm

regex_replace(‘^p(\d+).*$’, ‘\1’)

‘\1’ in the second argument is a “backref” (backwards reference) to the (\d+) in the first argument. It seems it is looking for an expression with digits and extracting the digits.

Your list ‘t’ has names with p1_xyz, p2_xyz, p4_xyx so this regex would extract the 1, 2, 4 digits from those strings.

Your string ‘s’ has digits 1 and 2. You are getting two lines of output as expected.

Walter

sivel · January 8, 2024, 9:51pm

This is a result of some normalization code in jinja2 that attempts to unescape strings:

https://github.com/pallets/jinja/blob/d594969d722ceb4e8f3da8861befc9c0ac87ae1b/src/jinja2/lexer.py#L647-L653

That code results in those becoming ‘^p(\d+).*$’ and ‘\1’.

Those 2 when statements, when processed by pyyaml become:

[“( item | regex_replace(‘^p(\d+).$‘, ‘\\1’) ) in s",
"( item | regex_replace(’^p(\\d+).$’, ‘\\1’) ) in s”]

Then if we apply the .encode/.decode:

“( item | regex_replace(‘^p(\d+).$‘, ‘\\1’) ) in s".encode(“ascii”, “backslashreplace”).decode(“unicode-escape”)
"( item | regex_replace(’^p(\d+).$’, ‘\1’) ) in s”

“( item | regex_replace(‘^p(\\d+).$‘, ‘\\1’) ) in s".encode(“ascii”, “backslashreplace”).decode(“unicode-escape”)
"( item | regex_replace(’^p(\d+).$’, ‘\1’) ) in s”

rjwagner.dba · January 8, 2024, 11:57pm

Thanks Matt, but I still don’t get why the first parameter (\d) MAY be double backslashed but the second parameter (\1) MUST be double backslashed. However, I’m starting to think it’s at the python level. https://stackoverflow.com/a/33582215 says Python’s string parser causes both \d and \d to become \d. But why? A little more searching takes me to https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences, where I think I see why \1 becomes \1 and \1 becomes a non-printable character (octal 1). But then, by analogy, \d should become \d (it does) but why doesn’t \d become an error (since it’s not listed as a valid escape sequence).

Maybe I’ll take this over to the Python list.

Walter_Rowe · January 9, 2024, 12:53pm

The \1 must be double-backslashed because the backref needs to be backslash-digit (\1). Doubling the backslash escapes the backslash.

Walter

rjwagner.dba · January 9, 2024, 2:04pm

Right, but why doesn’t the \d need to be double-backslashed? Backslash-d is regex for matching on a digit. I just don’t get why doubling the backslash is needed on the 1 but not on the d.

Walter_Rowe · January 9, 2024, 2:19pm

Perhaps because you have single quotes inside double quotes so everything inside the single quotes is automatically escaped?

Walter

rjwagner.dba · January 9, 2024, 2:36pm

But the \1 is also inside single and double quotes, so if that were the reason, I wouldn’t have to double backslash the 1

sivel · January 11, 2024, 11:18pm

Part of the problem is also knowing what characters are escape sequences in python.

\1 is an escape sequence, equivalent to \x01, and not equivalent to the literal \1. As such a literal \1 needs to be represented in python as \\1. \d is not an escape sequence and thus can be written as a literal \d without escaping the \

There is also a difference with quoting in YAML as mentioned above, between single quotes and double quotes. But note that the behavior of YAML with quotes only applies to quotes that surround the entire YAML value. So the single quotes you have in the middle of your string do not affect the YAML quoting differences. When not using quotes surrounding the full value in YAML, you are using “Plain Style” which has different rules than both single and double quoted values.

YAML single quotes are basically equivalent to python raw strings, where a backslash is always treated as literal. Double quotes require escaping backslashes. You can read more about the flow scalar styles of YAML at https://yaml.org/spec/1.2.2/#73-flow-scalar-styles

rjwagner.dba · January 17, 2024, 6:43pm

Thanks everyone. I’m going to chalk this up to a Python anomaly. IMO, since \d is not a valid escape sequence, Python should raise an error rather than transparently converting it into \d.

Topic		Replies	Views
Escaping a single backslash in the "replace" part of a regex_replace Ansible Project	1	46	March 9, 2017
Escaping special characters in regex_replace with '\' fails with: unknown escape character Ansible Project	5	201	February 4, 2021
replace module regexp with newline character Ansible Project	13	48	March 24, 2019
Why doesn't Ansible like my regexp? Ansible Project	2	3	November 5, 2018
How to end up with a single backslash Get Help windows	2	197	February 21, 2025

backslashes in regex_replace filter

Related topics