Python解析然后放入数据框架中

huangapple go评论77阅读模式
英文:

Python parse then put in a dataframe

问题

我想创建一个类似的数据框:

DATA1 ERROR1
123456 500
56789 505
英文:

I have a file with a data like this:

------------------------------
------------------------------
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10>

------
++++++
%%RequestHandler
	DATA1 = 123456
	ERROR1 = 500
	DATA2 = 56789
	ERROR2 = 505

Count = 4
---

I would like to create a dataframe like

DATA1 ERROR1
123456 500
56789 505

答案1

得分: 2

以下是您要的代码翻译:

import re
import pandas as pd

# 读取文件
with open("file.txt", "r") as file:
    content = file.read()

# 使用正则表达式从原始结构化文本文件中提取值
data = re.findall(r"DATA\d+\s*=\s*(\d+)", content)
error = re.findall(r"ERROR\d+\s*=\s*(\d+)", content)

# 创建一个数据框
df = pd.DataFrame({"DATA": data, "ERROR": error})
print(df)

示例:

import re
import pandas as pd

content = '''
------------------------------
------------------------------
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10>

------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505

Count = 4
---
'''

data = re.findall(r"DATA\d+\s*=\s*(\d+)", content)
error = re.findall(r"ERROR\d+\s*=\s*(\d+)", content)

df = pd.DataFrame({"DATA": data, "ERROR": error})
print(df)

输出:

     DATA ERROR
0  123456   500
1   56789   505

(注意:代码中的 &quot; 在中文翻译中并没有特殊意义,因此我将其保留为英文引号 "。)

英文:

Here is the code that you want, you can regular expressions to extract desired data from raw structured text file:

import re
import pandas as pd

# Read the file
with open(&quot;file.txt&quot;, &quot;r&quot;) as file:
    content = file.read()

# Use regular expressions to extract the values
data = re.findall(r&quot;DATA\d+\s*=\s*(\d+)&quot;, content)
error = re.findall(r&quot;ERROR\d+\s*=\s*(\d+)&quot;, content)

# Create a dataframe
df = pd.DataFrame({&quot;DATA&quot;: data, &quot;ERROR&quot;: error})
print(df)

Example:

import re
import pandas as pd

content = &#39;&#39;&#39;
------------------------------
------------------------------
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt;

------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505

Count = 4
---
&#39;&#39;&#39;

data = re.findall(r&quot;DATA\d+\s*=\s*(\d+)&quot;, content)
error = re.findall(r&quot;ERROR\d+\s*=\s*(\d+)&quot;, content)

df = pd.DataFrame({&quot;DATA&quot;: data, &quot;ERROR&quot;: error})
print(df)

Output:

     DATA ERROR
0  123456   500
1   56789   505

答案2

得分: 2

另一种使用 pivot 的正则表达式方法:

import re

# 或者使用 file.read()
out = (pd.DataFrame(re.findall(r'^\s+(\w+)(\d+) = (\d+)', text, flags=re.M))
         .pivot(index=1, columns=0, values=2)
         .rename_axis(index=None, columns=None)
      )

print(out)

输出结果:

     DATA ERROR
1  123456   500
2   56789   505

使用的输入:

text = '''------------------------------
------------------------------
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10>;

------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505

Count = 4'''

正则表达式演示

英文:

Another regex approach with pivot:

import re

                                                         # or file.read()
out = (pd.DataFrame(re.findall(r&#39;^\s+(\w+)(\d+) = (\d+)&#39;, text, flags=re.M))
         .pivot(index=1, columns=0, values=2)
         .rename_axis(index=None, columns=None)
      )

print(out)

Output:

     DATA ERROR
1  123456   500
2   56789   505

Used input:

text = &#39;&#39;&#39;------------------------------
------------------------------
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt;

------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505

Count = 4&#39;&#39;&#39;

regex demo

huangapple
  • 本文由 发表于 2023年7月12日 20:31:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76670562.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定