2023年7月12日 20:31:06go评论104阅读模式

英文:

Python parse then put in a dataframe

问题

我想创建一个类似的数据框：

DATA1	ERROR1
123456	500
56789	505

英文:

I have a file with a data like this:

------------------------------
------------------------------
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt;
------
++++++
%%RequestHandler
	DATA1 = 123456
	ERROR1 = 500
	DATA2 = 56789
	ERROR2 = 505
Count = 4
---

I would like to create a dataframe like

DATA1	ERROR1
123456	500
56789	505

答案1

得分: 2

以下是您要的代码翻译：

import re
import pandas as pd
# 读取文件
with open("file.txt", "r") as file:
    content = file.read()
# 使用正则表达式从原始结构化文本文件中提取值
data = re.findall(r"DATA\d+\s*=\s*(\d+)", content)
error = re.findall(r"ERROR\d+\s*=\s*(\d+)", content)
# 创建一个数据框
df = pd.DataFrame({"DATA": data, "ERROR": error})
print(df)

示例:

import re
import pandas as pd
content = '''
------------------------------
------------------------------
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10>
------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505
Count = 4
---
'''
data = re.findall(r"DATA\d+\s*=\s*(\d+)", content)
error = re.findall(r"ERROR\d+\s*=\s*(\d+)", content)
df = pd.DataFrame({"DATA": data, "ERROR": error})
print(df)

输出:

     DATA ERROR
0  123456   500
1   56789   505

（注意：代码中的 " 在中文翻译中并没有特殊意义，因此我将其保留为英文引号 "。）

英文:

Here is the code that you want, you can regular expressions to extract desired data from raw structured text file:

import re
import pandas as pd
# Read the file
with open(&quot;file.txt&quot;, &quot;r&quot;) as file:
    content = file.read()
# Use regular expressions to extract the values
data = re.findall(r&quot;DATA\d+\s*=\s*(\d+)&quot;, content)
error = re.findall(r&quot;ERROR\d+\s*=\s*(\d+)&quot;, content)
# Create a dataframe
df = pd.DataFrame({&quot;DATA&quot;: data, &quot;ERROR&quot;: error})
print(df)

Example:

import re
import pandas as pd
content = &#39;&#39;&#39;
------------------------------
------------------------------
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt;
------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505
Count = 4
---
&#39;&#39;&#39;
data = re.findall(r&quot;DATA\d+\s*=\s*(\d+)&quot;, content)
error = re.findall(r&quot;ERROR\d+\s*=\s*(\d+)&quot;, content)
df = pd.DataFrame({&quot;DATA&quot;: data, &quot;ERROR&quot;: error})
print(df)

Output:

     DATA ERROR
0  123456   500
1   56789   505

答案2

得分: 2

另一种使用 pivot 的正则表达式方法：

import re
# 或者使用 file.read()
out = (pd.DataFrame(re.findall(r'^\s+(\w+)(\d+) = (\d+)', text, flags=re.M))
         .pivot(index=1, columns=0, values=2)
         .rename_axis(index=None, columns=None)
      )
print(out)

输出结果：

     DATA ERROR
1  123456   500
2   56789   505

使用的输入：

text = '''------------------------------
------------------------------
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10> 
<TIME:2020-01-01 01:25:10>;
------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505
Count = 4'''

正则表达式演示

英文:

Another regex approach with pivot:

import re
                                                         # or file.read()
out = (pd.DataFrame(re.findall(r&#39;^\s+(\w+)(\d+) = (\d+)&#39;, text, flags=re.M))
         .pivot(index=1, columns=0, values=2)
         .rename_axis(index=None, columns=None)
      )
print(out)

Output:

     DATA ERROR
1  123456   500
2   56789   505

Used input:

text = &#39;&#39;&#39;------------------------------
------------------------------
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt; 
&lt;TIME:2020-01-01 01:25:10&gt;
------
++++++
%%RequestHandler
    DATA1 = 123456
    ERROR1 = 500
    DATA2 = 56789
    ERROR2 = 505
Count = 4&#39;&#39;&#39;

regex demo

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python解析然后放入数据框架中

问题

答案1

答案2

ModuleNotFoundError: 找不到模块名为 ‘_psutil_osx’

适用于视频分类的正确输入形状，使用图像文件夹。

如何在Pygame中用一个物体（玩家）推动另一个物体（箱子）？

Matplotlib 自定义刻度和分组网格

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。