问题

我有一个`.xml`文件如下所示：

```xml
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt;
  &lt;Act  bind=&quot;a&quot; &gt;
    &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
    &lt;Key frame=&quot;62&quot; value=&quot;8&quot; /&gt;
    &lt;Key frame=&quot;106&quot; value=&quot;14&quot; /&gt;    
  &lt;/&gt;
  &lt;Act  bind=&quot;b&quot; &gt;
    &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
    &lt;Key frame=&quot;62&quot; value=&quot;30&quot; /&gt;
    &lt;Key frame=&quot;106&quot; value=&quot;30&quot; /&gt;    
  &lt;/&gt;
  &lt;Act  bind=&quot;c&quot; &gt;
    &lt;Key frame=&quot;28&quot; value=&quot;81&quot; /&gt;
    &lt;Key frame=&quot;58&quot; value=&quot;78&quot; /&gt;
    &lt;Key frame=&quot;102&quot; value=&quot;78&quot; /&gt;    
  &lt;/&gt;
&lt;/&gt;

如何编写一个脚本将这些值保存在Excel文件中，其中列数与bind数量相同，行数与frame数量相同。并且，帧按升序排序，如果bind没有某帧的值，则插入0。类似以下所示：

      a         b         c   
28    0         0         81
32    1         1         0
58    0         0         78
62    8         30        0
102   0         0         78
106   14        30        0

非常感谢任何帮助。


<details>
<summary>英文:</summary>

I have an `.xml` file as the following:

```xml
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; ?&gt;
  &lt;Act  bind=&quot;a&quot; &gt;
    &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
    &lt;Key frame=&quot;62&quot; value=&quot;8&quot; /&gt;
    &lt;Key frame=&quot;106&quot; value=&quot;14&quot; /&gt;    
  &lt;/&gt;
  &lt;Act  bind=&quot;b&quot; &gt;
    &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
    &lt;Key frame=&quot;62&quot; value=&quot;30&quot; /&gt;
    &lt;Key frame=&quot;106&quot; value=&quot;30&quot; /&gt;    
  &lt;/&gt;
  &lt;Act  bind=&quot;c&quot; &gt;
    &lt;Key frame=&quot;28&quot; value=&quot;81&quot; /&gt;
    &lt;Key frame=&quot;58&quot; value=&quot;78&quot; /&gt;
    &lt;Key frame=&quot;102&quot; value=&quot;78&quot; /&gt;    
  &lt;/&gt;
&lt;/&gt;

How can I write an script that saves these values in an excel file, where I have as many columns as binds and as many rows as frames. Also, frames are sorted ascending and if a bind doesn't have a value for a frame, 0 is inserted. Something like the following:

      a         b         c   
28    0         0         81
32    1         1         0
58    0         0         78
62    8         30        0
102   0         0         78
106   14        30        0

Any help is very much appreciated.

答案1

得分: 1

如@GillesQu所提到的，这真的不是在SO上提问问题的正确方式，请在下次尝试提问时改进。

尽管如此，这是一个足够有趣的任务，所以我试图解决它。首先，你在问题中的xml格式不正确，所以我进行了修复（见下文），使其成为我认为应该的样子。

一旦我们解决了这个问题，就需要使用lxml、xpath、列表推导式、f-strings和几个pandas方法来完成这个有点技巧性的任务，但是这是可以做到的。
[对于那些可能会想知道的人，顺便说一句，我尝试过但未能使用`pandas.read_xml()`达到相同的效果]。

所以总体来说：

    # 你的xml，已修复
    source = """<?xml version="1.0" encoding="UTF-8"?>
    <root>
       <Act bind="a">
          <Key frame="32" value="1" />
          <Key frame="62" value="8" />
          <Key frame="106" value="14" />
       </Act>
       <Act bind="b">
          <Key frame="32" value="1" />
          <Key frame="62" value="30" />
          <Key frame="106" value="30" />
       </Act>
       <Act bind="c">
          <Key frame="28" value="81" />
          <Key frame="58" value="78" />
          <Key frame="102" value="78" />
       </Act>
    </root>
    """
    from lxml import etree
    doc = etree.XML(source.encode())
    # 你需要将它转换为整数，以便后面可以正确地排序：
    frames = [int(frm) for frm in doc.xpath('//Key/@frame')]
    # 找到唯一的值并排序
    uni_frames = sorted(set(frames))
    
    # 开始准备你的数据框
    rows, cols = [],[]
    for act in doc.xpath('//Act'):
        # 获取列名
        cols.append(act.xpath('./@bind')[0])
        row = []
        for frame in uni_frames:
            # 获取帧的值
            val = f'.//Key[@frame={frame}]/@value'
            entry = act.xpath(val)[0] if act.xpath(val) else "0"
            row.append(entry)
        rows.append(row)
    
    # 现在组装数据框本身
    df = pd.DataFrame(rows)
    df = df.transpose()
    # 重命名列名
    df.columns = cols
    # 最后，插入第一列
    df.insert(loc = 0,
              column = 'frame',
              value = uni_frames)
    df

输出：

      frame a  b  c
    0  28  0  0  81
    1  32  1  1  0
    2  58  0  0  78
    3  62  8  30  0
    4  102 0  0  78
    5  106 14 30  0

现在你有了数据框，你可以使用其他pandas方法[保存到Excel][1]或[到CSV][2]等。

  [1]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
  [2]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

英文:

As mentioned by @GillesQuénot, this is really not the way to ask questions on SO, so please try to do better next time.

Nevertheless, it's an interesting enough an assignment, so I tried to tackle it. First off, your xml in the question is not well formed so I fixed it (see below) to what I believe it should be.

Once we're past that, it becomes a somewhat acrobatic task requiring the use of lxml, xpath, list comprehensions, f-strings and several pandas methods - but it's doable.
[Note for those who may wonder - FWIW, I tried but failed to achieve the same thing using pandas.read_xml()].

So all together:

#your xml, fixed
source = &quot;&quot;&quot;&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;root&gt;
   &lt;Act bind=&quot;a&quot;&gt;
      &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
      &lt;Key frame=&quot;62&quot; value=&quot;8&quot; /&gt;
      &lt;Key frame=&quot;106&quot; value=&quot;14&quot; /&gt;
   &lt;/Act&gt;
   &lt;Act bind=&quot;b&quot;&gt;
      &lt;Key frame=&quot;32&quot; value=&quot;1&quot; /&gt;
      &lt;Key frame=&quot;62&quot; value=&quot;30&quot; /&gt;
      &lt;Key frame=&quot;106&quot; value=&quot;30&quot; /&gt;
   &lt;/Act&gt;
   &lt;Act bind=&quot;c&quot;&gt;
      &lt;Key frame=&quot;28&quot; value=&quot;81&quot; /&gt;
      &lt;Key frame=&quot;58&quot; value=&quot;78&quot; /&gt;
      &lt;Key frame=&quot;102&quot; value=&quot;78&quot; /&gt;
   &lt;/Act&gt;
&lt;/root&gt;
&quot;&quot;&quot;
from lxml import etree
doc = etree.XML(source.encode())
#you need to convert to int so it can be properly sorted later:
frames = [int(frm) for frm in doc.xpath(&#39;//Key/@frame&#39;)]
#find the unique values and sort them
uni_frames = sorted(set(frames))

#start preparing your dataframe
rows, cols = [],[]
for act in doc.xpath(&#39;//Act&#39;):
    #get the column names
    cols.append(act.xpath(&#39;./@bind&#39;)[0])
    row = []
    for frame in uni_frames:
        #get the frame values
        val = f&#39;.//Key[@frame={frame}]/@value&#39;
        entry = act.xpath(val)[0] if act.xpath(val) else &quot;0&quot;
        row.append(entry)
    rows.append(row)

#now assemble the dataframe itself
df = pd.DataFrame(rows)
df = df.transpose()
#rename the columns
df.columns = cols
#finally, insert the first column
df.insert(loc = 0,
          column = &#39;frame&#39;,
          value = uni_frames)
df

Output:

  frame a 	b 	c
0 	28 	0 	0 	81
1 	32 	1 	1 	0
2 	58 	0 	0 	78
3 	62 	8 	30 	0
4 	102 0 	0 	78
5 	106 14 	30 	0

Now that you have the dataframe, you can use other pandas methods to save to Excel or to CSV, etc.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从XML数据中提取数据并保存到Excel文件中。

问题

答案1

将XPath属性值中的科学计数法解析为数字。

如何在GO中解码灵活的XML？

Python – 从XML中抓取数据

从具有自关闭标签的XML中导出文本

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论