如何从XML数据中提取数据并保存到Excel文件中。

huangapple go评论121阅读模式
英文:

How to extract data from an xml data and save in an excel file

问题

我有一个`.xml`文件如下所示:

```xml
<?xml version="1.0" encoding="UTF-8" ?>
  <Act  bind="a" >
    <Key frame="32" value="1" />
    <Key frame="62" value="8" />
    <Key frame="106" value="14" />    
  </>
  <Act  bind="b" >
    <Key frame="32" value="1" />
    <Key frame="62" value="30" />
    <Key frame="106" value="30" />    
  </>
  <Act  bind="c" >
    <Key frame="28" value="81" />
    <Key frame="58" value="78" />
    <Key frame="102" value="78" />    
  </>
</>

如何编写一个脚本将这些值保存在Excel文件中,其中列数与bind数量相同,行数与frame数量相同。并且,帧按升序排序,如果bind没有某帧的值,则插入0。类似以下所示:

      a         b         c   
28    0         0         81
32    1         1         0
58    0         0         78
62    8         30        0
102   0         0         78
106   14        30        0

非常感谢任何帮助。


<details>
<summary>英文:</summary>

I have an `.xml` file as the following:

```xml
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt;
  &lt;Act  bind=&quot;a&quot; &gt;
    &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
    &lt;Key frame=&quot;62&quot; value=&quot;8&quot; /&gt;
    &lt;Key frame=&quot;106&quot; value=&quot;14&quot; /&gt;    
  &lt;/&gt;
  &lt;Act  bind=&quot;b&quot; &gt;
    &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
    &lt;Key frame=&quot;62&quot; value=&quot;30&quot; /&gt;
    &lt;Key frame=&quot;106&quot; value=&quot;30&quot; /&gt;    
  &lt;/&gt;
  &lt;Act  bind=&quot;c&quot; &gt;
    &lt;Key frame=&quot;28&quot; value=&quot;81&quot; /&gt;
    &lt;Key frame=&quot;58&quot; value=&quot;78&quot; /&gt;
    &lt;Key frame=&quot;102&quot; value=&quot;78&quot; /&gt;    
  &lt;/&gt;
&lt;/&gt;

How can I write an script that saves these values in an excel file, where I have as many columns as binds and as many rows as frames. Also, frames are sorted ascending and if a bind doesn't have a value for a frame, 0 is inserted. Something like the following:

      a         b         c   
28    0         0         81
32    1         1         0
58    0         0         78
62    8         30        0
102   0         0         78
106   14        30        0

Any help is very much appreciated.

答案1

得分: 1

如@GillesQu所提到的,这真的不是在SO上提问问题的正确方式,请在下次尝试提问时改进。

尽管如此,这是一个足够有趣的任务,所以我试图解决它。首先,你在问题中的xml格式不正确,所以我进行了修复(见下文),使其成为我认为应该的样子。

一旦我们解决了这个问题,就需要使用lxml、xpath、列表推导式、f-strings和几个pandas方法来完成这个有点技巧性的任务,但是这是可以做到的。
[对于那些可能会想知道的人,顺便说一句,我尝试过但未能使用`pandas.read_xml()`达到相同的效果]。

所以总体来说:

    # 你的xml,已修复
    source = """<?xml version="1.0" encoding="UTF-8"?>
    <root>
       <Act bind="a">
          <Key frame="32" value="1" />
          <Key frame="62" value="8" />
          <Key frame="106" value="14" />
       </Act>
       <Act bind="b">
          <Key frame="32" value="1" />
          <Key frame="62" value="30" />
          <Key frame="106" value="30" />
       </Act>
       <Act bind="c">
          <Key frame="28" value="81" />
          <Key frame="58" value="78" />
          <Key frame="102" value="78" />
       </Act>
    </root>
    """
    from lxml import etree
    doc = etree.XML(source.encode())
    # 你需要将它转换为整数,以便后面可以正确地排序:
    frames = [int(frm) for frm in doc.xpath('//Key/@frame')]
    # 找到唯一的值并排序
    uni_frames = sorted(set(frames))
    
    # 开始准备你的数据框
    rows, cols = [],[]
    for act in doc.xpath('//Act'):
        # 获取列名
        cols.append(act.xpath('./@bind')[0])
        row = []
        for frame in uni_frames:
            # 获取帧的值
            val = f'.//Key[@frame={frame}]/@value'
            entry = act.xpath(val)[0] if act.xpath(val) else "0"
            row.append(entry)
        rows.append(row)
    
    # 现在组装数据框本身
    df = pd.DataFrame(rows)
    df = df.transpose()
    # 重命名列名
    df.columns = cols
    # 最后,插入第一列
    df.insert(loc = 0,
              column = 'frame',
              value = uni_frames)
    df

输出:

      frame a  b  c
    0  28  0  0  81
    1  32  1  1  0
    2  58  0  0  78
    3  62  8  30  0
    4  102 0  0  78
    5  106 14 30  0

现在你有了数据框,你可以使用其他pandas方法[保存到Excel][1]或[到CSV][2]等。

  [1]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
  [2]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
英文:

As mentioned by @GillesQuénot, this is really not the way to ask questions on SO, so please try to do better next time.

Nevertheless, it's an interesting enough an assignment, so I tried to tackle it. First off, your xml in the question is not well formed so I fixed it (see below) to what I believe it should be.

Once we're past that, it becomes a somewhat acrobatic task requiring the use of lxml, xpath, list comprehensions, f-strings and several pandas methods - but it's doable.
[Note for those who may wonder - FWIW, I tried but failed to achieve the same thing using pandas.read_xml()].

So all together:

#your xml, fixed
source = &quot;&quot;&quot;&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;root&gt;
   &lt;Act bind=&quot;a&quot;&gt;
      &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
      &lt;Key frame=&quot;62&quot; value=&quot;8&quot; /&gt;
      &lt;Key frame=&quot;106&quot; value=&quot;14&quot; /&gt;
   &lt;/Act&gt;
   &lt;Act bind=&quot;b&quot;&gt;
      &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
      &lt;Key frame=&quot;62&quot; value=&quot;30&quot; /&gt;
      &lt;Key frame=&quot;106&quot; value=&quot;30&quot; /&gt;
   &lt;/Act&gt;
   &lt;Act bind=&quot;c&quot;&gt;
      &lt;Key frame=&quot;28&quot; value=&quot;81&quot; /&gt;
      &lt;Key frame=&quot;58&quot; value=&quot;78&quot; /&gt;
      &lt;Key frame=&quot;102&quot; value=&quot;78&quot; /&gt;
   &lt;/Act&gt;
&lt;/root&gt;
&quot;&quot;&quot;
from lxml import etree
doc = etree.XML(source.encode())
#you need to convert to int so it can be properly sorted later:
frames = [int(frm) for frm in doc.xpath(&#39;//Key/@frame&#39;)]
#find the unique values and sort them
uni_frames = sorted(set(frames))

#start preparing your dataframe
rows, cols = [],[]
for act in doc.xpath(&#39;//Act&#39;):
    #get the column names
    cols.append(act.xpath(&#39;./@bind&#39;)[0])
    row = []
    for frame in uni_frames:
        #get the frame values
        val = f&#39;.//Key[@frame={frame}]/@value&#39;
        entry = act.xpath(val)[0] if act.xpath(val) else &quot;0&quot;
        row.append(entry)
    rows.append(row)

#now assemble the dataframe itself
df = pd.DataFrame(rows)
df = df.transpose()
#rename the columns
df.columns = cols
#finally, insert the first column
df.insert(loc = 0,
          column = &#39;frame&#39;,
          value = uni_frames)
df

Output:

  frame a 	b 	c
0 	28 	0 	0 	81
1 	32 	1 	1 	0
2 	58 	0 	0 	78
3 	62 	8 	30 	0
4 	102 0 	0 	78
5 	106 14 	30 	0

Now that you have the dataframe, you can use other pandas methods to save to Excel or to CSV, etc.

huangapple
  • 本文由 发表于 2023年3月7日 18:40:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660910.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定