英文:
Parse website sitemap.xml using Ansible XML
问题
以下是您要翻译的内容:
- Download the file using
ansible.builtin.uri
and register the result into a variable. - Either loop the
url
nodes insideurlset
and create a list with theloc
and thepriority
, or - Convert it to JSON and do the same.
I am stuck at points 2-3. This is my current code:
- name: Get the website 'sitemap.xml' file
ansible.builtin.uri:
url: "https://example.com/sitemap.xml"
method: GET
return_content: true
headers:
Accept: "application/xml"
status_code: 200
timeout: 5
register: sitemap
delegate_to: localhost
- name: Parse the retrieved XML file
community.general.xml:
xmlstring: "{{ sitemap.content }}"
xpath: /s:urlset
content: text
namespaces:
s: http://www.sitemaps.org/schemas/sitemap/0.9
register: parsedxml
delegate_to: localhost
Now parsedxml.xmlstring
contains the XML of the file sitemap.xml
, which is something I already had at the sitemap.content
variable. So, basically, I haven't been able to either:
- Use
community.general.xml
to somehow build a list of dicts (withloc
andpriority
) by looping the list ofurl
nodes, - Or convert the XML file to JSON using the
ansible.netcommon.parse_xml
filter, but I have not been able to produce a specifications file to be passed as parametre to the filter. And the documentation of such filter seems to be missing.
Any hints on how to loop through all the url
nodes and build such list of dictionaries?
英文:
There is this website with a /sitemap.xml
file such as follows:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://example.com/es/</loc>
<lastmod>2023-02-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://example/en/</loc>
<lastmod>2023-02-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://example.com/en/destinations/</loc>
<lastmod>2021-09-16</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
[..]
</urlset>
Using Ansible (latest version) I am trying to:
- Download the file using
ansible.builtin.uri
and register the result into a variable. - Either loop the
url
nodes insideurlset
and create a list with theloc
and thepriority
, or - Convert it to JSON and do the same.
I am stuck at points 2-3. This is my current code:
- name: Get the website 'sitemap.xml' file
ansible.builtin.uri:
url: "https://example.com/sitemap.xml"
method: GET
return_content: true
headers:
Accept: "application/xml"
status_code: 200
timeout: 5
register: sitemap
delegate_to: localhost
- name: Parse the retrieved XML file
community.general.xml:
xmlstring: "{{ sitemap.content }}"
xpath: /s:urlset
content: text
namespaces:
s: http://www.sitemaps.org/schemas/sitemap/0.9
register: parsedxml
delegate_to: localhost
Now parsedxml.xmlstring
contains the XML of the file sitemap.xml
, which is something I already had at the sitemap.content
variable. So, basically, I haven't been able to either:
- Use
community.general.xml
to somehow build a list of dicts (withloc
andpriority
) by looping the list ofurl
nodes, - Or convert the XML file to JSON using the
ansible.netcommon.parse_xml
filter, but I have not been able to produce a specifications file to be passed as parametre to the filter. And the documentation of such filter seems to be missing.
Any hints on how to loop through all the url
nodes and build such list of dictionaries?
答案1
得分: 1
好的,以下是您要的翻译部分:
"So, yeah, ansible isn't great at dealing with xml, and that parsexml module you found is really designed for use by ansible net module authors which explains why it is so terrible to use"
"所以,是的,Ansible 不擅长处理 XML,你找到的 parsexml 模块实际上是为 Ansible 网络模块的作者设计的,这就解释了为什么它很难使用。"
"This is my approach:"
"这是我的方法:"
"which produces:"
"生成的结果如下:"
"as best I can tell, that .xml: module really is designed for more 'surgical' changes than a generic XPath query into a document, and definitely bad for 'give me multiple keys'. So, we just cheat and compose the XPath more than once, for each child key we wish, and then | zip the two result lists back together."
"就我所知,这个 .xml: 模块的设计更多地用于进行'精细'的更改,而不是通用的XPath查询,对于'给我多个键'来说确实不太适用。因此,我们只需多次组合XPath,对于我们想要的每个子键,然后将两个结果列表使用 | zip 组合在一起。"
英文:
So, yeah, ansible isn't great at dealing with xml, and that parsexml module you found is really designed for use by ansible net module authors which explains why it is so terrible to use
This is my approach:
tasks:
- vars:
sitemap:
content: |
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://example.com/es/</loc>
<lastmod>2023-02-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://example/en/</loc>
<lastmod>2023-02-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
</urlset>
with_items: [ loc, priority ]
register: parsedxml
community.general.xml:
xmlstring: '{{ sitemap.content }}'
xpath: /s:urlset/s:url/s:{{ item }}
content: text
namespaces:
s: "http://www.sitemaps.org/schemas/sitemap/0.9"
- set_fact:
things: '{{ tmp | from_yaml }}'
vars:
zipped_shape: |
[
[ {"loc": "http"}, {"pri": "1.0"} ],
[ {"loc": "http"}, {"pri": "1.0"} ],
]
tmp: |
{% set zipped = parsedxml.results[0].matches
| zip(parsedxml.results[1].matches) %}
{% for tup in zipped %}
- url: {{ tup[0].values()|first }}
pri: {{ tup[1].values()|first }}
{% endfor %}
which produces:
{
"ansible_facts": {
"things": [
{
"pri": 0.5,
"url": "https://example.com/es/"
},
{
"pri": 0.5,
"url": "https://example/en/"
}
]
} }
as best I can tell, that .xml:
module really is designed for more "surgical" changes than a generic XPath query into a document, and definitely bad for "give me multiple keys". So, we just cheat and compose the XPath more than once, for each child key we wish, and then | zip
the two result lists back together.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论