2023年2月8日 19:21:45go评论91阅读模式

英文:

Perform multiple regex operations on each line of text file and store extracted data in respective column

问题

以下是您要的翻译部分：

import pandas as pd
import re
columns = ['Request Type', 'Channel', 'AG']
exp = re.compile(r'<(.*)\s+xmlns'
                 r'<Channel>(.*?)</Channel>'
                 r'<Param Name="AG">.*?<Value>(.*?)</Value>')
final = []
with open("test.txt") as f:
    for line in f:
        result = re.search(exp, line)
        final.append(result)
df = pd.DataFrame(final, columns)
print(df)

import pandas as pd
import re
columns = ['Request Type', 'Channel', 'AG']
exp = re.compile(r'<(.*)\s+xmlns'
                 r'<Channel>(.*?)</Channel>'
                 r'<Param Name="AG">.*?<Value>(.*?)</Value>')
final = []
with open("test.txt") as f:
    for line in f:
        result = re.search(exp, line)
        final.append(result)
df = pd.DataFrame(final, columns)
print(df)

您的代码用于从文本文件中提取数据，并创建一个带有请求类型、通道和AG的DataFrame。希望这有助于您实现预期的输出。

英文:

Data in test.txt

&lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityCode&gt;DXB&lt;/CityCode&gt;&lt;CountryCode&gt;EG&lt;/CountryCode&gt;&lt;Currency&gt;USD&lt;/Currency&gt;&lt;Channel&gt;TA&lt;/Channel&gt;&lt;/SaleInfo&gt;&lt;Pricing&gt;&lt;CustomParams&gt;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;95HAJSTI&lt;/Value&gt;&lt;/Param&gt;&lt;/CustomParams&gt;&lt;/Pricing&gt;&lt;/ServiceRQ&gt;
&lt;SearchRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityCode&gt;CPT&lt;/CityCode&gt;&lt;CountryCode&gt;US&lt;/CountryCode&gt;&lt;Currency&gt;USD&lt;/Currency&gt;&lt;Channel&gt;AY&lt;/Channel&gt;&lt;/SaleInfo&gt;&lt;Pricing&gt;&lt;CustomParams&gt;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;56ASJSTS&lt;/Value&gt;&lt;/Param&gt;&lt;/CustomParams&gt;&lt;/Pricing&gt;&lt;/SearchRQ&gt;
&lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityCode&gt;BOM&lt;/CityCode&gt;&lt;CountryCode&gt;AU&lt;/CountryCode&gt;&lt;Currency&gt;USD&lt;/Currency&gt;&lt;Channel&gt;QA&lt;/Channel&gt;&lt;/SaleInfo&gt;&lt;Pricing&gt;&lt;CustomParams&gt;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;85ATAKSQ&lt;/Value&gt;&lt;/Param&gt;&lt;/CustomParams&gt;&lt;/Pricing&gt;&lt;/ServiceRQ&gt;
&lt;ServiceRQ ......
&lt;SearchRQ ........

My code:

import pandas as pd
import re
columns = [&#39;Request Type&#39;,&#39;Channel&#39;,&#39;AG&#39;]
# data = pd.DataFrame
exp = re.compile(r&#39;&lt;(.*)\s+xmlns&#39;
                 r&#39;&lt;Channel&gt;(.*)&lt;/Channel&gt;&#39;
                 r&#39;&lt;Param Name=&quot;AG&quot;&gt;.*?&lt;Value&gt;(.*?)&lt;/Value&gt;&#39;)
final = []
with open(r&quot;test.txt&quot;) as f:
    for line in f:
        result = re.search(exp,line)
        final.append(result)
    df = pd.DataFrame(final, columns)
    print(df)

My expected output is
I want to iterate through each line and to perform the 3 regex operation and extract data from each line in text file

1. r&#39;&lt;(.*)\s+xmlns&#39;
2. r&#39;&lt;Channel&gt;(.*)&lt;/Channel&gt;&#39;
3. r&#39;&lt;Param Name=&quot;AG&quot;&gt;.*?&lt;Value&gt;(.*?)&lt;/Value&gt;&#39;)

Each regex extract respective data from single line
like

extract the type of request
extract the name of channel
extract the value present for AG

My expected output ExcelSheet

Request Type    Channel       AG
ServiceRQ         TA        95HAJSTI  
SearchRQ          AY        56ASJSTS
ServiceRQ         QA        85ATAKSQ
 ...              ...         .....
 ...              ....        .....
and so on..

How can I achieve expected output.

答案1

得分: 1

尝试使用这个re，实际上我不知道你的文本内容的其余部分是什么样的，但根据我目前所见，这将适用。result.groups() 将提取所有组的匹配元素，然后在附加之前返回一个元组。

exp = re.compile(r'<(\w+)\s+xmlns.*?>.*?'
                 r'<Channel>(.*?)</Channel>.*?'
                 r'<Param Name="AG"><Value>(.*?)</Value></Param>')
final = []
with open("test.txt") as f:
    for line in f:
        result = re.search(exp, line)
        if result:
            final.append(result.groups())
            
df = pd.DataFrame(final, columns=columns)
print(df)

测试代码:

import pandas as pd
import re
columns = ['Request Type', 'Channel', 'AG']
file_data = """
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>
<SearchRQ xmlns:xsi="http://"><SaleInfo><CityCode>CPT</CityCode><CountryCode>US</CountryCode><Currency>USD</Currency><Channel>AY</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>56ASJSTS</Value></Param></CustomParams></Pricing></SearchRQ>
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>BOM</CityCode><CountryCode>AU</CountryCode><Currency>USD</Currency><Channel>QA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>85ATAKSQ</Value></Param></CustomParams></Pricing></ServiceRQ>
"""
exp = re.compile(r'<(\w+)\s+xmlns.*?>.*?'
                 r'<Channel>(.*?)</Channel>.*?'
                 r'<Param Name="AG"><Value>(.*?)</Value></Param>')
final = []
for line in file_data.splitlines():
    result = re.search(exp, line)
    if result:
        final.append(result.groups())
        
df = pd.DataFrame(final, columns=columns)
print(df)

  Request Type Channel        AG
0    ServiceRQ      TA  95HAJSTI
1     SearchRQ      AY  56ASJSTS
2    ServiceRQ      QA  85ATAKSQ

英文:

Try this re, actually I don't Know how the rest of your text content looks like, but this will work with what I have seen so far.<br>
result.groups() will extract matching elements of all groups then return a tuple before appending.

exp = re.compile(r&#39;&lt;(\w+)\s+xmlns.*?&gt;.*?&#39;
                 r&#39;&lt;Channel&gt;(.*?)&lt;/Channel&gt;.*?&#39;
                 r&#39;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;(.*?)&lt;/Value&gt;&#39;)
final = []
with open(r&quot;test.txt&quot;) as f:
    for line in f:
        result = re.search(exp,line)
        if result:
            final.append(result.groups())
            
df = pd.DataFrame(final, columns=columns)
print(df)

Test code:

import pandas as pd
import re
columns = [&#39;Request Type&#39;,&#39;Channel&#39;,&#39;AG&#39;]
file_data = &quot;&quot;&quot;
&lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityCode&gt;DXB&lt;/CityCode&gt;&lt;CountryCode&gt;EG&lt;/CountryCode&gt;&lt;Currency&gt;USD&lt;/Currency&gt;&lt;Channel&gt;TA&lt;/Channel&gt;&lt;/SaleInfo&gt;&lt;Pricing&gt;&lt;CustomParams&gt;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;95HAJSTI&lt;/Value&gt;&lt;/Param&gt;&lt;/CustomParams&gt;&lt;/Pricing&gt;&lt;/ServiceRQ&gt;
&lt;SearchRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityCode&gt;CPT&lt;/CityCode&gt;&lt;CountryCode&gt;US&lt;/CountryCode&gt;&lt;Currency&gt;USD&lt;/Currency&gt;&lt;Channel&gt;AY&lt;/Channel&gt;&lt;/SaleInfo&gt;&lt;Pricing&gt;&lt;CustomParams&gt;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;56ASJSTS&lt;/Value&gt;&lt;/Param&gt;&lt;/CustomParams&gt;&lt;/Pricing&gt;&lt;/SearchRQ&gt;
&lt;ServiceRQ xmlns:xsi=&quot;http://&quot;&gt;&lt;SaleInfo&gt;&lt;CityCode&gt;BOM&lt;/CityCode&gt;&lt;CountryCode&gt;AU&lt;/CountryCode&gt;&lt;Currency&gt;USD&lt;/Currency&gt;&lt;Channel&gt;QA&lt;/Channel&gt;&lt;/SaleInfo&gt;&lt;Pricing&gt;&lt;CustomParams&gt;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;85ATAKSQ&lt;/Value&gt;&lt;/Param&gt;&lt;/CustomParams&gt;&lt;/Pricing&gt;&lt;/ServiceRQ&gt;
&quot;&quot;&quot;
exp = re.compile(r&#39;&lt;(\w+)\s+xmlns.*?&gt;.*?&#39;
                 r&#39;&lt;Channel&gt;(.*?)&lt;/Channel&gt;.*?&#39;
                 r&#39;&lt;Param Name=&quot;AG&quot;&gt;&lt;Value&gt;(.*?)&lt;/Value&gt;&#39;)
final = []
for line in file_data.splitlines():
    result = re.search(exp,line)
    if result:
        final.append(result.groups())
        
df = pd.DataFrame(final, columns=columns)
print(df)


Request Type Channel        AG
0    ServiceRQ      TA  95HAJSTI
1     SearchRQ      AY  56ASJSTS
2    ServiceRQ      QA  85ATAKSQ

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Perform multiple regex operations on each line of text file and store extracted data in respective column

问题

答案1

PyScript：在HTML段落之间运行代码块？

将列表中的项目附加到Polars DataFrame中。

数据预处理阶段在机器学习中的正确顺序是什么？

将列表写入Python数据库

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。