Python代码从保存在txt文件中的XML中提取数值。

huangapple go评论87阅读模式
英文:

Python Code to extract values from xml saved in txt file

问题

如何从文本文件的单行中获取多个值
我在文本文件中保存了XML代码,如下所示

我在txt文件中存储了多个XML行。如何从每行中提取数据并存储在Excel工作表中,如下所示

存储在文本文件中的数据如下:

我尝试使用正则表达式来获取所需的输出,但是。发现很难创建精确的正则表达式模式。

英文:

How can I get the multiple values from a single line of text file
I have xml code saved in text file as shown below

  1. <ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>

I have multiple xml lines stored in txt file. How can I extract date from each line and store in excel sheet like

  1. Type Channel AG
  2. ServiceRQ TA 95HAJSTI
  3. SearchRQ AY 56ASJSTS
  4. SearviceRQ QA 85ATAKSQ
  5. ... .. ....
  6. ... .. ....

Data stored in text file is as:

  1. <ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>
  2. <SearchRQ xmlns:xsi="http://"><SaleInfo><CityCode>CPT</CityCode><CountryCode>US</CountryCode><Currency>USD</Currency><Channel>AY</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>56ASJSTS</Value></Param></CustomParams></Pricing></SearchRQ>
  3. <ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>BOM</CityCode><CountryCode>AU</CountryCode><Currency>USD</Currency><Channel>QA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>85ATAKSQ</Value></Param></CustomParams></Pricing></ServiceRQ>
  4. <ServiceRQ ......
  5. <SearchRQ ........

and so on...

I am trying to get the desire output using regex but. Finding it difficult to create the exact regex pattern.

答案1

得分: 1

IIUC使用:

  1. df = pd.read_csv(file, names=['Data'])
  2. df['Type'] = df['Data'].str.extract('<(.*)\s+xmlns')
  3. df['Channel'] = df['Data'].str.extract('<Channel>(.*)</Channel>')
  4. df['AG'] = df['Data'].str.extract('<Param Name="AG"><Value>(.*)</Value>')
  5. print(df)
  1. Data Type Channel AG
  2. 0 <ServiceRQ xmlns:xsi="http://"><SaleInfo><City... ServiceRQ TA 95HAJSTI
  3. 1 <SearchRQ xmlns:xsi="http://"><SaleInfo><CityC... SearchRQ AY 56ASJSTS
  4. 2 <ServiceRQ xmlns:xsi="http://"><SaleInfo><City... ServiceRQ QA 85ATAKSQ
英文:

IIUC use:

  1. df = pd.read_csv(file, names=[&#39;Data&#39;])
  2. df[&#39;Type&#39;] = df[&#39;Data&#39;].str.extract(&#39;&lt;(.*)\s+xmlns&#39;)
  3. df[&#39;Channel&#39;] = df[&#39;Data&#39;].str.extract(&#39;&lt;Channel&gt;(.*)&lt;/Channel&gt;&#39;)
  4. df[&#39;AG&#39;] = df[&#39;Data&#39;].str.extract(&#39;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;(.*)&lt;/Value&gt;&#39;)
  5. print (df)
  6. Data Type Channel \
  7. 0 &lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;City... ServiceRQ TA
  8. 1 &lt;SearchRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityC... SearchRQ AY
  9. 2 &lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;City... ServiceRQ QA
  10. AG
  11. 0 95HAJSTI
  12. 1 56ASJSTS
  13. 2 85ATAKSQ

huangapple
  • 本文由 发表于 2023年2月8日 14:29:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75382077.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定