英文:
Python Code to extract values from xml saved in txt file
问题
如何从文本文件的单行中获取多个值
我在文本文件中保存了XML代码,如下所示
我在txt文件中存储了多个XML行。如何从每行中提取数据并存储在Excel工作表中,如下所示
存储在文本文件中的数据如下:
我尝试使用正则表达式来获取所需的输出,但是。发现很难创建精确的正则表达式模式。
英文:
How can I get the multiple values from a single line of text file
I have xml code saved in text file as shown below
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>
I have multiple xml lines stored in txt file. How can I extract date from each line and store in excel sheet like
Type Channel AG
ServiceRQ TA 95HAJSTI
SearchRQ AY 56ASJSTS
SearviceRQ QA 85ATAKSQ
... .. ....
... .. ....
Data stored in text file is as:
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>
<SearchRQ xmlns:xsi="http://"><SaleInfo><CityCode>CPT</CityCode><CountryCode>US</CountryCode><Currency>USD</Currency><Channel>AY</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>56ASJSTS</Value></Param></CustomParams></Pricing></SearchRQ>
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>BOM</CityCode><CountryCode>AU</CountryCode><Currency>USD</Currency><Channel>QA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>85ATAKSQ</Value></Param></CustomParams></Pricing></ServiceRQ>
<ServiceRQ ......
<SearchRQ ........
and so on...
I am trying to get the desire output using regex but. Finding it difficult to create the exact regex pattern.
答案1
得分: 1
IIUC使用:
df = pd.read_csv(file, names=['Data'])
df['Type'] = df['Data'].str.extract('<(.*)\s+xmlns')
df['Channel'] = df['Data'].str.extract('<Channel>(.*)</Channel>')
df['AG'] = df['Data'].str.extract('<Param Name="AG"><Value>(.*)</Value>')
print(df)
Data Type Channel AG
0 <ServiceRQ xmlns:xsi="http://"><SaleInfo><City... ServiceRQ TA 95HAJSTI
1 <SearchRQ xmlns:xsi="http://"><SaleInfo><CityC... SearchRQ AY 56ASJSTS
2 <ServiceRQ xmlns:xsi="http://"><SaleInfo><City... ServiceRQ QA 85ATAKSQ
英文:
IIUC use:
df = pd.read_csv(file, names=['Data'])
df['Type'] = df['Data'].str.extract('<(.*)\s+xmlns')
df['Channel'] = df['Data'].str.extract('<Channel>(.*)</Channel>')
df['AG'] = df['Data'].str.extract('<Param Name="AG"><Value>(.*)</Value>')
print (df)
Data Type Channel \
0 <ServiceRQ xmlns:xsi="http://"><SaleInfo><City... ServiceRQ TA
1 <SearchRQ xmlns:xsi="http://"><SaleInfo><CityC... SearchRQ AY
2 <ServiceRQ xmlns:xsi="http://"><SaleInfo><City... ServiceRQ QA
AG
0 95HAJSTI
1 56ASJSTS
2 85ATAKSQ
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论