Python代码从保存在txt文件中的XML中提取数值。

huangapple go评论59阅读模式
英文:

Python Code to extract values from xml saved in txt file

问题

如何从文本文件的单行中获取多个值
我在文本文件中保存了XML代码,如下所示

我在txt文件中存储了多个XML行。如何从每行中提取数据并存储在Excel工作表中,如下所示

存储在文本文件中的数据如下:

我尝试使用正则表达式来获取所需的输出,但是。发现很难创建精确的正则表达式模式。

英文:

How can I get the multiple values from a single line of text file
I have xml code saved in text file as shown below

<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ> 

I have multiple xml lines stored in txt file. How can I extract date from each line and store in excel sheet like

Type       Channel     AG
ServiceRQ    TA      95HAJSTI
SearchRQ     AY      56ASJSTS
SearviceRQ   QA      85ATAKSQ
 ...         ..      ....
 ...         ..      ....

Data stored in text file is as:

<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>

<SearchRQ xmlns:xsi="http://"><SaleInfo><CityCode>CPT</CityCode><CountryCode>US</CountryCode><Currency>USD</Currency><Channel>AY</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>56ASJSTS</Value></Param></CustomParams></Pricing></SearchRQ>

<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>BOM</CityCode><CountryCode>AU</CountryCode><Currency>USD</Currency><Channel>QA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>85ATAKSQ</Value></Param></CustomParams></Pricing></ServiceRQ>

<ServiceRQ ......

<SearchRQ ........

and so on...

I am trying to get the desire output using regex but. Finding it difficult to create the exact regex pattern.

答案1

得分: 1

IIUC使用:

df = pd.read_csv(file, names=['Data'])

df['Type'] = df['Data'].str.extract('<(.*)\s+xmlns')
df['Channel'] = df['Data'].str.extract('<Channel>(.*)</Channel>')
df['AG'] = df['Data'].str.extract('<Param Name="AG"><Value>(.*)</Value>')

print(df)
                            Data       Type Channel        AG
0  <ServiceRQ xmlns:xsi="http://"><SaleInfo><City...  ServiceRQ      TA   95HAJSTI
1  <SearchRQ xmlns:xsi="http://"><SaleInfo><CityC...   SearchRQ      AY   56ASJSTS
2  <ServiceRQ xmlns:xsi="http://"><SaleInfo><City...  ServiceRQ      QA   85ATAKSQ
英文:

IIUC use:

df = pd.read_csv(file, names=[&#39;Data&#39;])
    
df[&#39;Type&#39;] = df[&#39;Data&#39;].str.extract(&#39;&lt;(.*)\s+xmlns&#39;)
df[&#39;Channel&#39;] = df[&#39;Data&#39;].str.extract(&#39;&lt;Channel&gt;(.*)&lt;/Channel&gt;&#39;)
df[&#39;AG&#39;] = df[&#39;Data&#39;].str.extract(&#39;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;(.*)&lt;/Value&gt;&#39;)

print (df)

                                                Data       Type Channel  \
0  &lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;City...  ServiceRQ      TA   
1  &lt;SearchRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityC...   SearchRQ      AY   
2  &lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;City...  ServiceRQ      QA   

         AG  
0  95HAJSTI  
1  56ASJSTS  
2  85ATAKSQ  

huangapple
  • 本文由 发表于 2023年2月8日 14:29:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382077.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定