如何在Python中高效处理一系列的串联?

huangapple go评论112阅读模式
英文:

Which way to efficiently treat a concatenation of series in Python?

问题

I am currently working on a Python function that creates a DataFrame based on 3 different values columns.

我目前正在开发一个Python函数,根据3个不同的值列创建一个DataFrame。

I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.

我有效地计算了这些值,但我的问题更多地涉及如何以最佳方式堆叠它们,以便返回一个包含3列的CSV文件。

My final dataset will be around 300k rows.

我的最终数据集将包含大约30万行。

What is the most efficient way to get a csv file with those 3 columns?

如何以最高效的方式获取包含这3列的CSV文件?

For the moment I am doing:

目前我正在这样做:

s = {"Text A" : value1, "Text B" : value2, "Score" : value3}
dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)

But it takes too much time.

但这太耗时了。

英文:

I am currently working on a Python function that creates a DataFrame based on 3 different values columns.
I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.

Here is an image of what I want to do:如何在Python中高效处理一系列的串联?

My final dataset will be around 300k rows.

What is the most efficient way to get a csv file with those 3 columns?
For the moment I am doing:

s = {"Text A" : value1, "Text B" : value2, "Score" : value3}
dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)

But it takes too much time.

答案1

得分: 1

如果值是在循环中生成的,最好创建一个字典列表,并将其传递给DataFrame构造函数,如果需要最有效的解决方案:

# 高效解决方案
L = []
for x in range(1000):
    d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
    L.append(d)

df = pd.DataFrame(L)
# 时间无关紧要的解决方案
df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])

for x in range(1000):
    d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
    df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
# 原始解决方案
df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])

for x in range(1000):
    s = {"Text A": 'cc', "Text B": 'ee', "Score": x}
    df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)

使用loc的解决方案速度较慢。

英文:

If values are generated in loop better is create list of dictionaries and pass to DataFrame constructor if need most efficient solution:

In [171]: %%timeit
     ...: L = []
     ...: for x in range(1000):
     ...:     d = {"Text A" : 'cc', "Text B" : 'ee', "Score" :  x}
     ...:     L.append(d)
     ...:     
     ...: df = pd.DataFrame(L)
     ...: 
1.64 ms ± 8.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Timeless solution 
In [172]: %%timeit
     ...: df = pd.DataFrame(columns=['Text A','Text B','Score'])
     ...: 
     ...: for x in range(1000):
     ...:     
     ...:     d = {"Text A" : 'cc', "Text B" : 'ee', "Score" :  x}
     ...:     df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
     ...:     
1.35 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#original solution
In [175]: %%timeit
     ...: df = pd.DataFrame(columns=['Text A','Text B','Score'])
     ...: 
     ...: for x in range(1000):
     ...:     
     ...:     s = {"Text A" : 'cc', "Text B" : 'ee', "Score" :  x}
     ...:     df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)
     ...:     
1.46 s ± 394 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solutions with loc are slowier.

答案2

得分: 1

I would do it this way (with concat):

s = {"Text A": "【 text hhhhhh 5 】", "Text B": "【 text aaaaaa 5 】", "Score": 0}

dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])

Output:

print(dataframe_return)

            Text A             Text B  Score
0   text hhhhhh 5    text aaaaaa 5    0.40
1   text hhhhhh 2    text aaaaaa 2    0.23
2   text hhhhhh 3    text aaaaaa 3    0.12
3   text hhhhhh 4    text aaaaaa 4    1.40
4   text hhhhhh 5    text aaaaaa 5    0.00

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
英文:

I would do it this way (with concat) :

s = {"Text A" : "« text hhhhhh 5 »", "Text B": "« text aaaaaa 5 »" , "Score": 0}
​
dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])


Output :

print(dataframe_return)

              Text A             Text B  Score
0    « text hhhhhh »    « text aaaaaa »   0.40
1  « text hhhhhh 2 »  « text aaaaaa 2 »   0.23
2  « text hhhhhh 3 »  « text aaaaaa 3 »   0.12
3  « text hhhhhh 4 »  « text aaaaaa 4 »   1.40
4  « text hhhhhh 5 »  « text aaaaaa 5 »   0.00

huangapple
  • 本文由 发表于 2023年4月13日 18:10:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定