2023年3月4日 06:58:19go评论82阅读模式

英文:

Parse website sitemap.xml using Ansible XML

问题

以下是您要翻译的内容：

Download the file using ansible.builtin.uri and register the result into a variable.
Either loop the url nodes inside urlset and create a list with the loc and the priority, or
Convert it to JSON and do the same.

I am stuck at points 2-3. This is my current code:

- name: Get the website &#39;sitemap.xml&#39; file
  ansible.builtin.uri:
    url: &quot;https://example.com/sitemap.xml&quot;
    method: GET
    return_content: true
    headers:
      Accept: &quot;application/xml&quot;
    status_code: 200
    timeout: 5
  register: sitemap
  delegate_to: localhost

- name: Parse the retrieved XML file
  community.general.xml:
    xmlstring: &quot;{{ sitemap.content }}&quot;
    xpath: /s:urlset
    content: text
    namespaces:
      s: http://www.sitemaps.org/schemas/sitemap/0.9
  register: parsedxml
  delegate_to: localhost

Now parsedxml.xmlstring contains the XML of the file sitemap.xml, which is something I already had at the sitemap.content variable. So, basically, I haven't been able to either:

Use community.general.xml to somehow build a list of dicts (with loc and priority) by looping the list of url nodes,
Or convert the XML file to JSON using the ansible.netcommon.parse_xml filter, but I have not been able to produce a specifications file to be passed as parametre to the filter. And the documentation of such filter seems to be missing.

Any hints on how to loop through all the url nodes and build such list of dictionaries?

英文:

There is this website with a /sitemap.xml file such as follows:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot; xmlns:xhtml=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
  &lt;url&gt;
    &lt;loc&gt;https://example.com/es/&lt;/loc&gt;
    &lt;lastmod&gt;2023-02-15&lt;/lastmod&gt;
    &lt;changefreq&gt;monthly&lt;/changefreq&gt;
    &lt;priority&gt;0.5&lt;/priority&gt;
  &lt;/url&gt;
  &lt;url&gt;
    &lt;loc&gt;https://example/en/&lt;/loc&gt;
    &lt;lastmod&gt;2023-02-15&lt;/lastmod&gt;
    &lt;changefreq&gt;monthly&lt;/changefreq&gt;
    &lt;priority&gt;0.5&lt;/priority&gt;
  &lt;/url&gt;
  &lt;url&gt;
    &lt;loc&gt;https://example.com/en/destinations/&lt;/loc&gt;
    &lt;lastmod&gt;2021-09-16&lt;/lastmod&gt;
    &lt;changefreq&gt;monthly&lt;/changefreq&gt;
    &lt;priority&gt;0.5&lt;/priority&gt;
  &lt;/url&gt;
[..]
&lt;/urlset&gt;

Using Ansible (latest version) I am trying to:

Download the file using ansible.builtin.uri and register the result into a variable.
Either loop the url nodes inside urlset and create a list with the loc and the priority, or
Convert it to JSON and do the same.

I am stuck at points 2-3. This is my current code:

- name: Get the website &#39;sitemap.xml&#39; file
  ansible.builtin.uri:
    url: &quot;https://example.com/sitemap.xml&quot;
    method: GET
    return_content: true
    headers:
      Accept: &quot;application/xml&quot;
    status_code: 200
    timeout: 5
  register: sitemap
  delegate_to: localhost

- name: Parse the retrieved XML file
  community.general.xml:
    xmlstring: &quot;{{ sitemap.content }}&quot;
    xpath: /s:urlset
    content: text
    namespaces:
      s: http://www.sitemaps.org/schemas/sitemap/0.9
  register: parsedxml
  delegate_to: localhost

Now parsedxml.xmlstring contains the XML of the file sitemap.xml, which is something I already had at the sitemap.content variable. So, basically, I haven't been able to either:

Use community.general.xml to somehow build a list of dicts (with loc and priority) by looping the list of url nodes,
Or convert the XML file to JSON using the ansible.netcommon.parse_xml filter, but I have not been able to produce a specifications file to be passed as parametre to the filter. And the documentation of such filter seems to be missing.

Any hints on how to loop through all the url nodes and build such list of dictionaries?

答案1

得分: 1

好的，以下是您要的翻译部分：

"So, yeah, ansible isn't great at dealing with xml, and that parsexml module you found is really designed for use by ansible net module authors which explains why it is so terrible to use"
"所以，是的，Ansible 不擅长处理 XML，你找到的 parsexml 模块实际上是为 Ansible 网络模块的作者设计的，这就解释了为什么它很难使用。"

"This is my approach:"
"这是我的方法："

"which produces:"
"生成的结果如下："

"as best I can tell, that .xml: module really is designed for more 'surgical' changes than a generic XPath query into a document, and definitely bad for 'give me multiple keys'. So, we just cheat and compose the XPath more than once, for each child key we wish, and then | zip the two result lists back together."
"就我所知，这个 .xml: 模块的设计更多地用于进行'精细'的更改，而不是通用的XPath查询，对于'给我多个键'来说确实不太适用。因此，我们只需多次组合XPath，对于我们想要的每个子键，然后将两个结果列表使用 | zip 组合在一起。"

英文:

So, yeah, ansible isn't great at dealing with xml, and that parsexml module you found is really designed for use by ansible net module authors which explains why it is so terrible to use

This is my approach:

  tasks:
    - vars:
        sitemap:
          content: |
            &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
            &lt;urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot; xmlns:xhtml=&quot;http://www.w3.org/1999/xhtml&quot;&gt;
              &lt;url&gt;
                &lt;loc&gt;https://example.com/es/&lt;/loc&gt;
                &lt;lastmod&gt;2023-02-15&lt;/lastmod&gt;
                &lt;changefreq&gt;monthly&lt;/changefreq&gt;
                &lt;priority&gt;0.5&lt;/priority&gt;
              &lt;/url&gt;
              &lt;url&gt;
                &lt;loc&gt;https://example/en/&lt;/loc&gt;
                &lt;lastmod&gt;2023-02-15&lt;/lastmod&gt;
                &lt;changefreq&gt;monthly&lt;/changefreq&gt;
                &lt;priority&gt;0.5&lt;/priority&gt;
              &lt;/url&gt;
            &lt;/urlset&gt;            
      with_items: [ loc, priority ]
      register: parsedxml
      community.general.xml:
        xmlstring: &#39;{{ sitemap.content }}&#39;
        xpath: /s:urlset/s:url/s:{{ item }}
        content: text
        namespaces:
          s: &quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;
    - set_fact:
        things: &#39;{{ tmp | from_yaml }}&#39;
      vars:
        zipped_shape: |
          [
            [ {&quot;loc&quot;: &quot;http&quot;}, {&quot;pri&quot;: &quot;1.0&quot;} ],
            [ {&quot;loc&quot;: &quot;http&quot;}, {&quot;pri&quot;: &quot;1.0&quot;} ],
          ]          
        tmp: |
           {% set zipped = parsedxml.results[0].matches 
                     | zip(parsedxml.results[1].matches) %}
           {% for tup in zipped %}
           - url: {{ tup[0].values()|first }}
             pri: {{ tup[1].values()|first }}
           {% endfor %}

which produces:

{
    &quot;ansible_facts&quot;: {
        &quot;things&quot;: [
            {
                &quot;pri&quot;: 0.5,
                &quot;url&quot;: &quot;https://example.com/es/&quot;
            },
            {
                &quot;pri&quot;: 0.5,
                &quot;url&quot;: &quot;https://example/en/&quot;
            }
        ]
} }

as best I can tell, that .xml: module really is designed for more "surgical" changes than a generic XPath query into a document, and definitely bad for "give me multiple keys". So, we just cheat and compose the XPath more than once, for each child key we wish, and then | zip the two result lists back together.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Parse website sitemap.xml using Ansible XML.

问题

答案1

Creating XSLT from specific XML

我如何自动化域XML更新/验证？

在Ansible Playbook中，如何从字典列表中选择键=key的字典。

Unix正则表达式：遍历文件夹并将文件另存为新名称。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论