英文:
How to make a nested dict (json) from a schema database xml?
问题
以下是您提供的XML代码的翻译:
<?xml version="1.0" encoding="UTF-8" ?>
<project name="so_project" id="Project-9999">
<schema name="database1">
<table name="table1">
<column name="foo" type="int"/>
<column name="bar" type="string"/>
<column name="details_resolution" type="array[object]">
<column name="timestamp" type="timestamp"/>
<column name="user_id" type="string"/>
<column name="user_name" type="string"/>
</column>
<column name="details_closure" type="array[object]">
<column name="timestamp" type="timestamp"/>
<column name="auto_closure" type="bool"/>
</column>
</table>
</schema>
<schema name="database2">
<table name="table1">
<column name="foo" type="int"/>
<column name="bar" type="string"/>
<column name="details" type="array[object]">
<column name="timestamp" type="timestamp"/>
<column name="value" type="float"/>
</column>
</table>
</schema>
</project>
请注意,这只是XML代码的翻译,没有任何其他内容。
英文:
Here is my input file.xml
:
<?xml version="1.0" encoding="UTF-8" ?>
<project name="so_project" id="Project-9999">
<schema name="database1">
<table name="table1">
<column name="foo" type="int"/>
<column name="bar" type="string"/>
<column name="details_resolution" type="array[object]">
<column name="timestamp" type="timestamp"/>
<column name="user_id" type="string"/>
<column name="user_name" type="string"/>
</column>
<column name="details_closure" type="array[object]">
<column name="timestamp" type="timestamp"/>
<column name="auto_closure" type="bool"/>
</column>
</table>
</schema>
<schema name="database2">
<table name="table1">
<column name="foo" type="int"/>
<column name="bar" type="string"/>
<column name="details" type="array[object]">
<column name="timestamp" type="timestamp"/>
<column name="value" type="float"/>
</column>
</table>
</schema>
</project>
.. and I'm trying to make this classical nested dict :
{
"database1": {
"table1": {
"foo": "int",
"bar": "string",
"details_resolution": {
"timestamp": "timestamp",
"user_id": "string",
"user_name": "string"
},
"details_closure": {
"timestamp": "timestamp",
"auto_closure": "bool"
}
}
},
"database2": {
"table1": {
"foo": "int",
"bar": "string",
"details": {
"timestamp": "timestamp",
"value": "float"
}
}
}
}
PS : Each database can eventually have more than one table.
I tried some AI codes but none of them gave me the expected result..
I'm sorry guys to not being able to show my attempts !
SO, any help would be greately appreciated.
答案1
得分: 1
您可以使用 xml.etree.ElementTree
模块。
import xml.etree.ElementTree as ET
def parse_column(column_elem):
column_data = {}
column_data['name'] = column_elem.get('name')
column_data['type'] = column_elem.get('type')
return column_data
def parse_table(table_elem):
table_data = {}
table_name = table_elem.get('name')
for column_elem in table_elem.findall('column'):
column_data = parse_column(column_elem)
table_data[column_data['name']] = column_data['type']
return {table_name: table_data}
def parse_schema(schema_elem):
schema_data = {}
schema_name = schema_elem.get('name')
for table_elem in schema_elem.findall('table'):
table_data = parse_table(table_elem)
schema_data.update(table_data)
return {schema_name: schema_data}
def parse_xml(xml_content):
root = ET.fromstring(xml_content)
project_data = {}
for schema_elem in root.findall('schema'):
schema_data = parse_schema(schema_elem)
project_data.update(schema_data)
return project_data
# 读取 XML 文件
with open('file.xml', 'r') as f:
xml_content = f.read()
# 解析 XML 并生成嵌套字典
nested_dict = parse_xml(xml_content)
print(nested_dict)
这是您提供的代码的翻译部分。
英文:
You can use xml.etree.ElementTree
import xml.etree.ElementTree as ET
def parse_column(column_elem):
column_data = {}
column_data['name'] = column_elem.get('name')
column_data['type'] = column_elem.get('type')
return column_data
def parse_table(table_elem):
table_data = {}
table_name = table_elem.get('name')
for column_elem in table_elem.findall('column'):
column_data = parse_column(column_elem)
table_data[column_data['name']] = column_data['type']
return {table_name: table_data}
def parse_schema(schema_elem):
schema_data = {}
schema_name = schema_elem.get('name')
for table_elem in schema_elem.findall('table'):
table_data = parse_table(table_elem)
schema_data.update(table_data)
return {schema_name: schema_data}
def parse_xml(xml_content):
root = ET.fromstring(xml_content)
project_data = {}
for schema_elem in root.findall('schema'):
schema_data = parse_schema(schema_elem)
project_data.update(schema_data)
return project_data
# Read XML file
with open('file.xml', 'r') as f:
xml_content = f.read()
# Parse XML and generate nested dictionary
nested_dict = parse_xml(xml_content)
print(nested_dict)
答案2
得分: 0
使用[标签:beautifulsoup]的解决方案:
from bs4 import BeautifulSoup
with open("your_file.xml", "r") as f_in:
soup = BeautifulSoup(f_in.read(), "xml")
def parse_columns(t):
out = {}
for c in t.find_all("column", recursive=False):
if c.find("column"):
out[c["name"]] = parse_columns(c)
else:
out[c["name"]] = c["type"]
return out
def parse_schema(sch):
out = {}
for t in sch.select("table"):
out[t["name"]] = parse_columns(t)
return out
out = {}
for sch in soup.select("schema"):
out[sch["name"]] = parse_schema(sch)
print(out)
打印输出:
{
"database1": {
"table1": {
"foo": "int",
"bar": "string",
"details_resolution": {
"timestamp": "timestamp",
"user_id": "string",
"user_name": "string",
},
"details_closure": {"timestamp": "timestamp", "auto_closure": "bool"},
}
},
"database2": {
"table1": {
"foo": "int",
"bar": "string",
"details": {"timestamp": "timestamp", "value": "float"},
}
},
}
英文:
Solution using [tag:beautifulsoup]:
from bs4 import BeautifulSoup
with open("your_file.xml", "r") as f_in:
soup = BeautifulSoup(f_in.read(), "xml")
def parse_columns(t):
out = {}
for c in t.find_all("column", recursive=False):
if c.find("column"):
out[c["name"]] = parse_columns(c)
else:
out[c["name"]] = c["type"]
return out
def parse_schema(sch):
out = {}
for t in sch.select("table"):
out[t["name"]] = parse_columns(t)
return out
out = {}
for sch in soup.select("schema"):
out[sch["name"]] = parse_schema(sch)
print(out)
Prints:
{
"database1": {
"table1": {
"foo": "int",
"bar": "string",
"details_resolution": {
"timestamp": "timestamp",
"user_id": "string",
"user_name": "string",
},
"details_closure": {"timestamp": "timestamp", "auto_closure": "bool"},
}
},
"database2": {
"table1": {
"foo": "int",
"bar": "string",
"details": {"timestamp": "timestamp", "value": "float"},
}
},
}
答案3
得分: 0
在XSLT 3.0中:
<xsl:output method="json" indent="yes" />
<xsl:template match="/">
<xsl:map>
<xsl:apply-templates select="*/schema"/>
</xsl:map>
</xsl:template>
<xsl:template match="*[*]">
<xsl:map-entry key="string(@name)">
<xsl:map>
<xsl:apply-templates select="*"/>
</xsl:map>
</xsl:map-entry>
</xsl:template>
<xsl:template match="*">
<xsl:map-entry key="string(@name)" select="string(@type)"/>
</xsl:template>
解释:
-
第一个模板规则匹配文档,创建最外层的地图,并处理
schema
元素,跳过project
级别。 -
第二个模板规则匹配具有一个或多个子元素的元素;它为容器元素创建一个具有@name属性作为键的地图条目,并通过递归应用模板规则生成另一个地图作为内容。
-
第三个模板规则匹配没有子元素的元素;它为容器元素创建一个地图条目,其中@name作为键,@type作为相应的值。
英文:
In XSLT 3.0:
<xsl:output method="json" indent="yes" />
<xsl:template match="/">
<xsl:map>
<xsl:apply-templates select="*/schema"/>
</xsl:map>
</xsl:template>
<xsl:template match="*[*]">
<xsl:map-entry key="string(@name)">
<xsl:map>
<xsl:apply-templates select="*"/>
</xsl:map>
</xsl:map-entry>
</xsl:template>
<xsl:template match="*">
<xsl:map-entry key="string(@name)" select="string(@type)"/>
</xsl:template>
See https://xsltfiddle.liberty-development.net/bdvWh3 for the full stylesheet including boilerplate.
Explanation:
-
The first template rule matches the document, creates the outermost map, and processes the
schema
element, skipping theproject
level. -
The second template rule matches elements that have one or more children; it creates a map entry for the container element with the @name attribute as the key, and generates another map as the content, by applying template rules to the children recursively.
-
The third template rule matches elements with no children; it creates a map entry for the container element, with
@name
as the key and@type
as the corresponding value.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论