2023年4月13日 18:10:36go评论149阅读模式

英文:

Which way to efficiently treat a concatenation of series in Python?

问题

I am currently working on a Python function that creates a DataFrame based on 3 different values columns.

我目前正在开发一个Python函数，根据3个不同的值列创建一个DataFrame。

I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.

我有效地计算了这些值，但我的问题更多地涉及如何以最佳方式堆叠它们，以便返回一个包含3列的CSV文件。

My final dataset will be around 300k rows.

我的最终数据集将包含大约30万行。

What is the most efficient way to get a csv file with those 3 columns?

如何以最高效的方式获取包含这3列的CSV文件？

For the moment I am doing:

目前我正在这样做：

s = {&quot;Text A&quot; : value1, &quot;Text B&quot; : value2, &quot;Score&quot; : value3}
dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)

But it takes too much time.

但这太耗时了。

英文:

I am currently working on a Python function that creates a DataFrame based on 3 different values columns.
I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.

Here is an image of what I want to do:

My final dataset will be around 300k rows.

What is the most efficient way to get a csv file with those 3 columns?
For the moment I am doing:

s = {&quot;Text A&quot; : value1, &quot;Text B&quot; : value2, &quot;Score&quot; : value3}
dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)

But it takes too much time.

答案1

得分: 1

如果值是在循环中生成的，最好创建一个字典列表，并将其传递给DataFrame构造函数，如果需要最有效的解决方案：

# 高效解决方案
L = []
for x in range(1000):
    d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
    L.append(d)
df = pd.DataFrame(L)

# 时间无关紧要的解决方案
df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])
for x in range(1000):
    d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
    df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])

# 原始解决方案
df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])
for x in range(1000):
    s = {"Text A": 'cc', "Text B": 'ee', "Score": x}
    df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)

使用loc的解决方案速度较慢。

英文:

If values are generated in loop better is create list of dictionaries and pass to DataFrame constructor if need most efficient solution:

In [171]: %%timeit
     ...: L = []
     ...: for x in range(1000):
     ...:     d = {&quot;Text A&quot; : &#39;cc&#39;, &quot;Text B&quot; : &#39;ee&#39;, &quot;Score&quot; :  x}
     ...:     L.append(d)
     ...:     
     ...: df = pd.DataFrame(L)
     ...: 
1.64 ms &#177; 8.58 &#181;s per loop (mean &#177; std. dev. of 7 runs, 100 loops each)
#Timeless solution 
In [172]: %%timeit
     ...: df = pd.DataFrame(columns=[&#39;Text A&#39;,&#39;Text B&#39;,&#39;Score&#39;])
     ...: 
     ...: for x in range(1000):
     ...:     
     ...:     d = {&quot;Text A&quot; : &#39;cc&#39;, &quot;Text B&quot; : &#39;ee&#39;, &quot;Score&quot; :  x}
     ...:     df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
     ...:     
1.35 s &#177; 167 ms per loop (mean &#177; std. dev. of 7 runs, 1 loop each)
#original solution
In [175]: %%timeit
     ...: df = pd.DataFrame(columns=[&#39;Text A&#39;,&#39;Text B&#39;,&#39;Score&#39;])
     ...: 
     ...: for x in range(1000):
     ...:     
     ...:     s = {&quot;Text A&quot; : &#39;cc&#39;, &quot;Text B&quot; : &#39;ee&#39;, &quot;Score&quot; :  x}
     ...:     df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)
     ...:     
1.46 s &#177; 394 ms per loop (mean &#177; std. dev. of 7 runs, 1 loop each)

Solutions with loc are slowier.

答案2

得分: 1

I would do it this way (with concat):

s = {"Text A": "【 text hhhhhh 5 】", "Text B": "【 text aaaaaa 5 】", "Score": 0}
dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])

Output:

print(dataframe_return)
            Text A             Text B  Score
0  【 text hhhhhh 5 】  【 text aaaaaa 5 】   0.40
1  【 text hhhhhh 2 】  【 text aaaaaa 2 】   0.23
2  【 text hhhhhh 3 】  【 text aaaaaa 3 】   0.12
3  【 text hhhhhh 4 】  【 text aaaaaa 4 】   1.40
4  【 text hhhhhh 5 】  【 text aaaaaa 5 】   0.00
[1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

英文:

I would do it this way (with concat) :

s = {&quot;Text A&quot; : &quot;&#171; text hhhhhh 5 &#187;&quot;, &quot;Text B&quot;: &quot;&#171; text aaaaaa 5 &#187;&quot; , &quot;Score&quot;: 0}
dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])

Output :

print(dataframe_return)
              Text A             Text B  Score
0    &#171; text hhhhhh &#187;    &#171; text aaaaaa &#187;   0.40
1  &#171; text hhhhhh 2 &#187;  &#171; text aaaaaa 2 &#187;   0.23
2  &#171; text hhhhhh 3 &#187;  &#171; text aaaaaa 3 &#187;   0.12
3  &#171; text hhhhhh 4 &#187;  &#171; text aaaaaa 4 &#187;   1.40
4  &#171; text hhhhhh 5 &#187;  &#171; text aaaaaa 5 &#187;   0.00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中高效处理一系列的串联？

问题

答案1

答案2

密码装饰器

Python Flask未解析传递给路由的GET参数。

在Azure Functions中在运行时安装Python模块。

Snakemake从两个通道中减去一个遮罩。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。