英文:
How to extract data from an xml data and save in an excel file
问题
我有一个`.xml`文件如下所示:
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<Act bind="a" >
<Key frame="32" value="1" />
<Key frame="62" value="8" />
<Key frame="106" value="14" />
</>
<Act bind="b" >
<Key frame="32" value="1" />
<Key frame="62" value="30" />
<Key frame="106" value="30" />
</>
<Act bind="c" >
<Key frame="28" value="81" />
<Key frame="58" value="78" />
<Key frame="102" value="78" />
</>
</>
如何编写一个脚本将这些值保存在Excel文件中,其中列数与bind数量相同,行数与frame数量相同。并且,帧按升序排序,如果bind没有某帧的值,则插入0。类似以下所示:
a b c
28 0 0 81
32 1 1 0
58 0 0 78
62 8 30 0
102 0 0 78
106 14 30 0
非常感谢任何帮助。
<details>
<summary>英文:</summary>
I have an `.xml` file as the following:
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<Act bind="a" >
<Key frame="32" value="1" />
<Key frame="62" value="8" />
<Key frame="106" value="14" />
</>
<Act bind="b" >
<Key frame="32" value="1" />
<Key frame="62" value="30" />
<Key frame="106" value="30" />
</>
<Act bind="c" >
<Key frame="28" value="81" />
<Key frame="58" value="78" />
<Key frame="102" value="78" />
</>
</>
How can I write an script that saves these values in an excel file, where I have as many columns as binds and as many rows as frames. Also, frames are sorted ascending and if a bind doesn't have a value for a frame, 0 is inserted. Something like the following:
a b c
28 0 0 81
32 1 1 0
58 0 0 78
62 8 30 0
102 0 0 78
106 14 30 0
Any help is very much appreciated.
答案1
得分: 1
如@GillesQu所提到的,这真的不是在SO上提问问题的正确方式,请在下次尝试提问时改进。
尽管如此,这是一个足够有趣的任务,所以我试图解决它。首先,你在问题中的xml格式不正确,所以我进行了修复(见下文),使其成为我认为应该的样子。
一旦我们解决了这个问题,就需要使用lxml、xpath、列表推导式、f-strings和几个pandas方法来完成这个有点技巧性的任务,但是这是可以做到的。
[对于那些可能会想知道的人,顺便说一句,我尝试过但未能使用`pandas.read_xml()`达到相同的效果]。
所以总体来说:
# 你的xml,已修复
source = """<?xml version="1.0" encoding="UTF-8"?>
<root>
<Act bind="a">
<Key frame="32" value="1" />
<Key frame="62" value="8" />
<Key frame="106" value="14" />
</Act>
<Act bind="b">
<Key frame="32" value="1" />
<Key frame="62" value="30" />
<Key frame="106" value="30" />
</Act>
<Act bind="c">
<Key frame="28" value="81" />
<Key frame="58" value="78" />
<Key frame="102" value="78" />
</Act>
</root>
"""
from lxml import etree
doc = etree.XML(source.encode())
# 你需要将它转换为整数,以便后面可以正确地排序:
frames = [int(frm) for frm in doc.xpath('//Key/@frame')]
# 找到唯一的值并排序
uni_frames = sorted(set(frames))
# 开始准备你的数据框
rows, cols = [],[]
for act in doc.xpath('//Act'):
# 获取列名
cols.append(act.xpath('./@bind')[0])
row = []
for frame in uni_frames:
# 获取帧的值
val = f'.//Key[@frame={frame}]/@value'
entry = act.xpath(val)[0] if act.xpath(val) else "0"
row.append(entry)
rows.append(row)
# 现在组装数据框本身
df = pd.DataFrame(rows)
df = df.transpose()
# 重命名列名
df.columns = cols
# 最后,插入第一列
df.insert(loc = 0,
column = 'frame',
value = uni_frames)
df
输出:
frame a b c
0 28 0 0 81
1 32 1 1 0
2 58 0 0 78
3 62 8 30 0
4 102 0 0 78
5 106 14 30 0
现在你有了数据框,你可以使用其他pandas方法[保存到Excel][1]或[到CSV][2]等。
[1]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
[2]: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
英文:
As mentioned by @GillesQuénot, this is really not the way to ask questions on SO, so please try to do better next time.
Nevertheless, it's an interesting enough an assignment, so I tried to tackle it. First off, your xml in the question is not well formed so I fixed it (see below) to what I believe it should be.
Once we're past that, it becomes a somewhat acrobatic task requiring the use of lxml, xpath, list comprehensions, f-strings and several pandas methods - but it's doable.
[Note for those who may wonder - FWIW, I tried but failed to achieve the same thing using pandas.read_xml()
].
So all together:
#your xml, fixed
source = """<?xml version="1.0" encoding="UTF-8"?>
<root>
<Act bind="a">
<Key frame="32" value="1" />
<Key frame="62" value="8" />
<Key frame="106" value="14" />
</Act>
<Act bind="b">
<Key frame="32" value="1" />
<Key frame="62" value="30" />
<Key frame="106" value="30" />
</Act>
<Act bind="c">
<Key frame="28" value="81" />
<Key frame="58" value="78" />
<Key frame="102" value="78" />
</Act>
</root>
"""
from lxml import etree
doc = etree.XML(source.encode())
#you need to convert to int so it can be properly sorted later:
frames = [int(frm) for frm in doc.xpath('//Key/@frame')]
#find the unique values and sort them
uni_frames = sorted(set(frames))
#start preparing your dataframe
rows, cols = [],[]
for act in doc.xpath('//Act'):
#get the column names
cols.append(act.xpath('./@bind')[0])
row = []
for frame in uni_frames:
#get the frame values
val = f'.//Key[@frame={frame}]/@value'
entry = act.xpath(val)[0] if act.xpath(val) else "0"
row.append(entry)
rows.append(row)
#now assemble the dataframe itself
df = pd.DataFrame(rows)
df = df.transpose()
#rename the columns
df.columns = cols
#finally, insert the first column
df.insert(loc = 0,
column = 'frame',
value = uni_frames)
df
Output:
frame a b c
0 28 0 0 81
1 32 1 1 0
2 58 0 0 78
3 62 8 30 0
4 102 0 0 78
5 106 14 30 0
Now that you have the dataframe, you can use other pandas methods to save to Excel or to CSV, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论