Extracting a substring with regex_replace fails

ansible 2.6.0 (devel da5cf72236) last updated 2018/02/14 14:29:49 (GMT +200)

The task seems obvious at first glance but it appears to be difficult to implement in this context.
The string is:
host_meta: “\n \n”

I need to extract the values of rel and href into api_release and api_root.
I unsuccessfully tried the following:

  • set_fact: api_release=“{{ host_meta | regex_replace(‘^.* rel=(.) .$’, ‘\1’) }}”
    when: host_meta is defined

  • set_fact: api_root=“{{ host_meta | regex_replace(‘^.* href=(.)/>.$’, ‘\1’) }}”
    when: host_meta is defined

Both variables contain the whole string instead of the corresponding substring, which should be:
api_release: ‘restconf’

api_root: /restconf

I have already successfully used this filter in other contexts.
What am I missing here? Is the filter confused by the string?

I have the same difficulty with:

ansible 2.4.3.0 (detached HEAD 8a7f9beab7) last updated 2018/02/14 16:11:16 (GMT +200)

Hi,

The task seems obvious at first glance but it appears to be difficult
to implement in this context.
The string is:
host_meta: "<XRD
xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0’>\n <Link
rel='restconf' href='/restconf'/>\n</XRD>"

I need to extract the values of rel and href into api_release and
api_root. I unsuccessfully tried the following:
- set_fact: api_release="{{ host_meta | regex_replace('^.*
rel=(.*) .*$', '\\1') }}"
  when: host_meta is defined

- set_fact: api_root="{{ host_meta | regex_replace('^.*
href=(.*)/>.*$', '\\1') }}"
  when: host_meta is defined

Both variables contain the whole string instead of the corresponding
substring, which should be:
api_release: 'restconf'
api_root: /restconf

I have already successfully used this filter in other contexts.
What am I missing here? Is the filter confused by the string?

You are missing the newlines the string contains. regex_replace uses
re.sub() in Python and does not offer a way to set the MULTILINE flag.
If you remove '^' and '$' from your regexps, you can see what happens:
only the matching part between two '\n's is replaced.

Instead of regexp_replace() you should use the regex_search() filter;
that works as expected:

  - set_fact: api_release="{{ host_meta | regex_search('rel=(.*)') }}"
    when: host_meta is defined
  - set_fact: api_root="{{ host_meta | regex_search('href=(.*)/>') }}"
    when: host_meta is defined

Yields:

  "api_release": "rel='restconf' href='/restconf'/>"
  "api_root": "href='/restconf'/>"

(That's not exactly equal to what you expected, but that is what your
original regexes would have returned if the multiline flag would have
been set. The problem is that regexes are notoriously bad for matching
XML and or HTML.)

Cheers,
Felix

Hi,

instead of using regexes, you might want to use the XML module
(https://docs.ansible.com/ansible/2.4/xml_module.html) with the
xmlstring argument. That should allow you to do this in a much cleaner
way.

Cheers,
Felix

It's possible to turn on multiline, but the dot doesn't match newline so DOTALL flag need to be set to.

This can be done with (?mS), "m" is multiline and "S" is DOTALL.

So this should work:

- set_fact:
    api_release: "{{ host_meta | regex_replace('(?ms)^.* rel=(.*) .*$', '\\1') }}"

@Kai Stian Olstad
Thanks for your answer: yes, it works.

@Felix Fontain
I have no XML skills. I nevertheless took a look at the ansible xml module, and it’s obscure to me.
I think you’re right though: it’d be better to read XML appropriately with that module rather than use some regex filter which can be easily defeated if the input changes in the future.
So I tried something, but of course it does not work (api_root_filename is the file containing the xml multi-line string):

  • name: Reading RESTconf release
    xml:
    attribute: rel
    content: attribute
    path: “{{ api_root_filename }}”
    xpath: /XRD/Link
    register: return_restconf_release

  • name: Showing attribute value
    debug:
    var: return_restconf_release.matches[0].Link.rel

Hi,

I have no XML skills. I nevertheless took a look at the ansible xml
module, and it's obscure to me.
I think you're right though: it'd be better to read XML appropriately
with that module rather than use some regex filter which can be
easily defeated if the input changes in the future.
So I tried something, but of course it does not work
(api_root_filename is the file containing the xml multi-line string):

- name: Reading RESTconf release
  xml:
        attribute: rel
        content: attribute
        path: "{{ api_root_filename }}"
        xpath: /XRD/Link
  register: return_restconf_release

- name: Showing attribute value
  debug:
        var: return_restconf_release.matches[0].Link.rel

The problem here is that your XML uses a namespace. I'm not very
familiar with xpaths either, but after a little googling I came up with
this:

  - name: Reading RESTconf release
    xml:
          attribute: rel
          content: attribute
          path: "{{ api_root_filename }}"
          xpath: "/*[name()='XRD']/*[name()='Link']"
    register: return_restconf_release
  - name: Showing attribute value
    debug:
      var: return_restconf_release.matches[0]['{http://docs.oasis-open.org/ns/xri/xrd-1.0\}Link']

This gives for me:

ok: [localhost] => {
    "failed": false,
    "return_restconf_release.matches[0]['{http://docs.oasis-open.org/ns/xri/xrd-1.0\}Link']": {
        "href": "/restconf",
        "rel": "restconf"
    }
}

Cheers,
Felix

You made it, bravo!
However, this solution is not better than the regex search, because the whole point of using the xml module was to make it resilient to change. As soon as the URI {http://docs.oasis-open.org/ns/xri/xrd-1.0} changes, the call to access both variables will fail.