英文:
Which way to efficiently treat a concatenation of series in Python?
问题
I am currently working on a Python function that creates a DataFrame based on 3 different values columns.
我目前正在开发一个Python函数,根据3个不同的值列创建一个DataFrame。
I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.
我有效地计算了这些值,但我的问题更多地涉及如何以最佳方式堆叠它们,以便返回一个包含3列的CSV文件。
My final dataset will be around 300k rows.
我的最终数据集将包含大约30万行。
What is the most efficient way to get a csv file with those 3 columns?
如何以最高效的方式获取包含这3列的CSV文件?
For the moment I am doing:
目前我正在这样做:
s = {"Text A" : value1, "Text B" : value2, "Score" : value3}
dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)
But it takes too much time.
但这太耗时了。
英文:
I am currently working on a Python function that creates a DataFrame based on 3 different values columns.
I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.
Here is an image of what I want to do:
My final dataset will be around 300k rows.
What is the most efficient way to get a csv file with those 3 columns?
For the moment I am doing:
s = {"Text A" : value1, "Text B" : value2, "Score" : value3}
dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)
But it takes too much time.
答案1
得分: 1
如果值是在循环中生成的,最好创建一个字典列表,并将其传递给DataFrame构造函数,如果需要最有效的解决方案:
# 高效解决方案
L = []
for x in range(1000):
d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
L.append(d)
df = pd.DataFrame(L)
# 时间无关紧要的解决方案
df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])
for x in range(1000):
d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
# 原始解决方案
df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])
for x in range(1000):
s = {"Text A": 'cc', "Text B": 'ee', "Score": x}
df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)
使用loc
的解决方案速度较慢。
英文:
If values are generated in loop better is create list of dictionaries and pass to DataFrame constructor if need most efficient solution:
In [171]: %%timeit
...: L = []
...: for x in range(1000):
...: d = {"Text A" : 'cc', "Text B" : 'ee', "Score" : x}
...: L.append(d)
...:
...: df = pd.DataFrame(L)
...:
1.64 ms ± 8.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#Timeless solution
In [172]: %%timeit
...: df = pd.DataFrame(columns=['Text A','Text B','Score'])
...:
...: for x in range(1000):
...:
...: d = {"Text A" : 'cc', "Text B" : 'ee', "Score" : x}
...: df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
...:
1.35 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#original solution
In [175]: %%timeit
...: df = pd.DataFrame(columns=['Text A','Text B','Score'])
...:
...: for x in range(1000):
...:
...: s = {"Text A" : 'cc', "Text B" : 'ee', "Score" : x}
...: df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)
...:
1.46 s ± 394 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solutions with loc
are slowier.
答案2
得分: 1
I would do it this way (with concat
):
s = {"Text A": "【 text hhhhhh 5 】", "Text B": "【 text aaaaaa 5 】", "Score": 0}
dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])
Output:
print(dataframe_return)
Text A Text B Score
0 【 text hhhhhh 5 】 【 text aaaaaa 5 】 0.40
1 【 text hhhhhh 2 】 【 text aaaaaa 2 】 0.23
2 【 text hhhhhh 3 】 【 text aaaaaa 3 】 0.12
3 【 text hhhhhh 4 】 【 text aaaaaa 4 】 1.40
4 【 text hhhhhh 5 】 【 text aaaaaa 5 】 0.00
[1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
英文:
I would do it this way (with concat
) :
s = {"Text A" : "« text hhhhhh 5 »", "Text B": "« text aaaaaa 5 »" , "Score": 0}
dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])
Output :
print(dataframe_return)
Text A Text B Score
0 « text hhhhhh » « text aaaaaa » 0.40
1 « text hhhhhh 2 » « text aaaaaa 2 » 0.23
2 « text hhhhhh 3 » « text aaaaaa 3 » 0.12
3 « text hhhhhh 4 » « text aaaaaa 4 » 1.40
4 « text hhhhhh 5 » « text aaaaaa 5 » 0.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论