如何从架构数据库XML创建嵌套字典(JSON)?

huangapple go评论80阅读模式
英文:

How to make a nested dict (json) from a schema database xml?

问题

以下是您提供的XML代码的翻译:

<?xml version="1.0" encoding="UTF-8" ?>
<project name="so_project" id="Project-9999">
    <schema name="database1">
        <table name="table1">
            <column name="foo" type="int"/>
            <column name="bar" type="string"/>
            <column name="details_resolution" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="user_id" type="string"/>
                <column name="user_name" type="string"/>
            </column>
            <column name="details_closure" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="auto_closure" type="bool"/>
            </column>
        </table>
    </schema>
    <schema name="database2">
        <table name="table1">
            <column name="foo" type="int"/>
            <column name="bar" type="string"/>
            <column name="details" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="value" type="float"/>
            </column>
        </table>
    </schema>
</project>

请注意,这只是XML代码的翻译,没有任何其他内容。

英文:

Here is my input file.xml :

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt;
&lt;project name=&quot;so_project&quot; id=&quot;Project-9999&quot;&gt;
    &lt;schema name=&quot;database1&quot;&gt;
        &lt;table name=&quot;table1&quot;&gt;
            &lt;column name=&quot;foo&quot; type=&quot;int&quot;/&gt;
            &lt;column name=&quot;bar&quot; type=&quot;string&quot;/&gt;
            &lt;column name=&quot;details_resolution&quot; type=&quot;array[object]&quot;&gt;
                &lt;column name=&quot;timestamp&quot; type=&quot;timestamp&quot;/&gt;
                &lt;column name=&quot;user_id&quot; type=&quot;string&quot;/&gt;
                &lt;column name=&quot;user_name&quot; type=&quot;string&quot;/&gt;
            &lt;/column&gt;
            &lt;column name=&quot;details_closure&quot; type=&quot;array[object]&quot;&gt;
                &lt;column name=&quot;timestamp&quot; type=&quot;timestamp&quot;/&gt;
                &lt;column name=&quot;auto_closure&quot; type=&quot;bool&quot;/&gt;
            &lt;/column&gt;
        &lt;/table&gt;
    &lt;/schema&gt;
    &lt;schema name=&quot;database2&quot;&gt;
        &lt;table name=&quot;table1&quot;&gt;
            &lt;column name=&quot;foo&quot; type=&quot;int&quot;/&gt;
            &lt;column name=&quot;bar&quot; type=&quot;string&quot;/&gt;
            &lt;column name=&quot;details&quot; type=&quot;array[object]&quot;&gt;
                &lt;column name=&quot;timestamp&quot; type=&quot;timestamp&quot;/&gt;
                &lt;column name=&quot;value&quot; type=&quot;float&quot;/&gt;
            &lt;/column&gt;
        &lt;/table&gt;
    &lt;/schema&gt;
&lt;/project&gt;

.. and I'm trying to make this classical nested dict :

{
    &quot;database1&quot;: {
        &quot;table1&quot;: {
            &quot;foo&quot;: &quot;int&quot;,
            &quot;bar&quot;: &quot;string&quot;,
            &quot;details_resolution&quot;: {
                &quot;timestamp&quot;: &quot;timestamp&quot;,
                &quot;user_id&quot;: &quot;string&quot;,
                &quot;user_name&quot;: &quot;string&quot;
            },
            &quot;details_closure&quot;: {
                &quot;timestamp&quot;: &quot;timestamp&quot;,
                &quot;auto_closure&quot;: &quot;bool&quot;
            }
        }
    },
    &quot;database2&quot;: {
        &quot;table1&quot;: {
            &quot;foo&quot;: &quot;int&quot;,
            &quot;bar&quot;: &quot;string&quot;,
            &quot;details&quot;: {
                &quot;timestamp&quot;: &quot;timestamp&quot;,
                &quot;value&quot;: &quot;float&quot;
            }
        }
    }
}

PS : Each database can eventually have more than one table.

I tried some AI codes but none of them gave me the expected result..
I'm sorry guys to not being able to show my attempts !

SO, any help would be greately appreciated.

答案1

得分: 1

您可以使用 xml.etree.ElementTree 模块。

import xml.etree.ElementTree as ET

def parse_column(column_elem):
    column_data = {}
    column_data['name'] = column_elem.get('name')
    column_data['type'] = column_elem.get('type')
    return column_data

def parse_table(table_elem):
    table_data = {}
    table_name = table_elem.get('name')
    for column_elem in table_elem.findall('column'):
        column_data = parse_column(column_elem)
        table_data[column_data['name']] = column_data['type']
    return {table_name: table_data}

def parse_schema(schema_elem):
    schema_data = {}
    schema_name = schema_elem.get('name')
    for table_elem in schema_elem.findall('table'):
        table_data = parse_table(table_elem)
        schema_data.update(table_data)
    return {schema_name: schema_data}

def parse_xml(xml_content):
    root = ET.fromstring(xml_content)
    project_data = {}
    for schema_elem in root.findall('schema'):
        schema_data = parse_schema(schema_elem)
        project_data.update(schema_data)
    return project_data

# 读取 XML 文件
with open('file.xml', 'r') as f:
    xml_content = f.read()

# 解析 XML 并生成嵌套字典
nested_dict = parse_xml(xml_content)
print(nested_dict)

这是您提供的代码的翻译部分。

英文:

You can use xml.etree.ElementTree

import xml.etree.ElementTree as ET

def parse_column(column_elem):
    column_data = {}
    column_data[&#39;name&#39;] = column_elem.get(&#39;name&#39;)
    column_data[&#39;type&#39;] = column_elem.get(&#39;type&#39;)
    return column_data

def parse_table(table_elem):
    table_data = {}
    table_name = table_elem.get(&#39;name&#39;)
    for column_elem in table_elem.findall(&#39;column&#39;):
        column_data = parse_column(column_elem)
        table_data[column_data[&#39;name&#39;]] = column_data[&#39;type&#39;]
    return {table_name: table_data}

def parse_schema(schema_elem):
    schema_data = {}
    schema_name = schema_elem.get(&#39;name&#39;)
    for table_elem in schema_elem.findall(&#39;table&#39;):
        table_data = parse_table(table_elem)
        schema_data.update(table_data)
    return {schema_name: schema_data}

def parse_xml(xml_content):
    root = ET.fromstring(xml_content)
    project_data = {}
    for schema_elem in root.findall(&#39;schema&#39;):
        schema_data = parse_schema(schema_elem)
        project_data.update(schema_data)
    return project_data

# Read XML file
with open(&#39;file.xml&#39;, &#39;r&#39;) as f:
    xml_content = f.read()

# Parse XML and generate nested dictionary
nested_dict = parse_xml(xml_content)
print(nested_dict)

答案2

得分: 0

使用[标签:beautifulsoup]的解决方案:

from bs4 import BeautifulSoup

with open("your_file.xml", "r") as f_in:
    soup = BeautifulSoup(f_in.read(), "xml")


def parse_columns(t):
    out = {}
    for c in t.find_all("column", recursive=False):
        if c.find("column"):
            out[c["name"]] = parse_columns(c)
        else:
            out[c["name"]] = c["type"]
    return out


def parse_schema(sch):
    out = {}
    for t in sch.select("table"):
        out[t["name"]] = parse_columns(t)
    return out


out = {}
for sch in soup.select("schema"):
    out[sch["name"]] = parse_schema(sch)

print(out)

打印输出:

{
    "database1": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details_resolution": {
                "timestamp": "timestamp",
                "user_id": "string",
                "user_name": "string",
            },
            "details_closure": {"timestamp": "timestamp", "auto_closure": "bool"},
        }
    },
    "database2": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details": {"timestamp": "timestamp", "value": "float"},
        }
    },
}
英文:

Solution using [tag:beautifulsoup]:

from bs4 import BeautifulSoup

with open(&quot;your_file.xml&quot;, &quot;r&quot;) as f_in:
    soup = BeautifulSoup(f_in.read(), &quot;xml&quot;)


def parse_columns(t):
    out = {}
    for c in t.find_all(&quot;column&quot;, recursive=False):
        if c.find(&quot;column&quot;):
            out[c[&quot;name&quot;]] = parse_columns(c)
        else:
            out[c[&quot;name&quot;]] = c[&quot;type&quot;]
    return out


def parse_schema(sch):
    out = {}
    for t in sch.select(&quot;table&quot;):
        out[t[&quot;name&quot;]] = parse_columns(t)
    return out


out = {}
for sch in soup.select(&quot;schema&quot;):
    out[sch[&quot;name&quot;]] = parse_schema(sch)

print(out)

Prints:

{
    &quot;database1&quot;: {
        &quot;table1&quot;: {
            &quot;foo&quot;: &quot;int&quot;,
            &quot;bar&quot;: &quot;string&quot;,
            &quot;details_resolution&quot;: {
                &quot;timestamp&quot;: &quot;timestamp&quot;,
                &quot;user_id&quot;: &quot;string&quot;,
                &quot;user_name&quot;: &quot;string&quot;,
            },
            &quot;details_closure&quot;: {&quot;timestamp&quot;: &quot;timestamp&quot;, &quot;auto_closure&quot;: &quot;bool&quot;},
        }
    },
    &quot;database2&quot;: {
        &quot;table1&quot;: {
            &quot;foo&quot;: &quot;int&quot;,
            &quot;bar&quot;: &quot;string&quot;,
            &quot;details&quot;: {&quot;timestamp&quot;: &quot;timestamp&quot;, &quot;value&quot;: &quot;float&quot;},
        }
    },
}

答案3

得分: 0

在XSLT 3.0中:

<xsl:output method="json" indent="yes" />
<xsl:template match="/">
  <xsl:map>
    <xsl:apply-templates select="*/schema"/>
  </xsl:map>
</xsl:template>

<xsl:template match="*[*]">
  <xsl:map-entry key="string(@name)">
    <xsl:map>
      <xsl:apply-templates select="*"/>
    </xsl:map>
  </xsl:map-entry>
</xsl:template>

<xsl:template match="*">
  <xsl:map-entry key="string(@name)" select="string(@type)"/>
</xsl:template>

解释:

  • 第一个模板规则匹配文档,创建最外层的地图,并处理schema元素,跳过project级别。

  • 第二个模板规则匹配具有一个或多个子元素的元素;它为容器元素创建一个具有@name属性作为键的地图条目,并通过递归应用模板规则生成另一个地图作为内容。

  • 第三个模板规则匹配没有子元素的元素;它为容器元素创建一个地图条目,其中@name作为键,@type作为相应的值。

英文:

In XSLT 3.0:

  &lt;xsl:output method=&quot;json&quot; indent=&quot;yes&quot; /&gt;
  
  &lt;xsl:template match=&quot;/&quot;&gt;
    &lt;xsl:map&gt;
      &lt;xsl:apply-templates select=&quot;*/schema&quot;/&gt;
    &lt;/xsl:map&gt;
  &lt;/xsl:template&gt;

  &lt;xsl:template match=&quot;*[*]&quot;&gt;
     &lt;xsl:map-entry key=&quot;string(@name)&quot;&gt;
        &lt;xsl:map&gt;
          &lt;xsl:apply-templates select=&quot;*&quot;/&gt;
        &lt;/xsl:map&gt;
     &lt;/xsl:map-entry&gt;
  &lt;/xsl:template&gt;
  
  &lt;xsl:template match=&quot;*&quot;&gt;
    &lt;xsl:map-entry key=&quot;string(@name)&quot; select=&quot;string(@type)&quot;/&gt;
  &lt;/xsl:template&gt;

See https://xsltfiddle.liberty-development.net/bdvWh3 for the full stylesheet including boilerplate.

Explanation:

  • The first template rule matches the document, creates the outermost map, and processes the schema element, skipping the project level.

  • The second template rule matches elements that have one or more children; it creates a map entry for the container element with the @name attribute as the key, and generates another map as the content, by applying template rules to the children recursively.

  • The third template rule matches elements with no children; it creates a map entry for the container element, with @name as the key and @type as the corresponding value.

huangapple
  • 本文由 发表于 2023年7月18日 07:01:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76708573.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定