如何在Python中高效处理一系列的串联?

huangapple go评论149阅读模式
英文:

Which way to efficiently treat a concatenation of series in Python?

问题

I am currently working on a Python function that creates a DataFrame based on 3 different values columns.

我目前正在开发一个Python函数,根据3个不同的值列创建一个DataFrame。

I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.

我有效地计算了这些值,但我的问题更多地涉及如何以最佳方式堆叠它们,以便返回一个包含3列的CSV文件。

My final dataset will be around 300k rows.

我的最终数据集将包含大约30万行。

What is the most efficient way to get a csv file with those 3 columns?

如何以最高效的方式获取包含这3列的CSV文件?

For the moment I am doing:

目前我正在这样做:

  1. s = {"Text A" : value1, "Text B" : value2, "Score" : value3}
  2. dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)

But it takes too much time.

但这太耗时了。

英文:

I am currently working on a Python function that creates a DataFrame based on 3 different values columns.
I efficiently compute those values but my question is more on how to stack them the best way possible in order to return a csv file where I get 3 columns.

Here is an image of what I want to do:如何在Python中高效处理一系列的串联?

My final dataset will be around 300k rows.

What is the most efficient way to get a csv file with those 3 columns?
For the moment I am doing:

  1. s = {"Text A" : value1, "Text B" : value2, "Score" : value3}
  2. dataframe_return = pd.concat([dataframe_return, pd.Series(s).to_frame().T], ignore_index=True)

But it takes too much time.

答案1

得分: 1

如果值是在循环中生成的,最好创建一个字典列表,并将其传递给DataFrame构造函数,如果需要最有效的解决方案:

  1. # 高效解决方案
  2. L = []
  3. for x in range(1000):
  4. d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
  5. L.append(d)
  6. df = pd.DataFrame(L)
  1. # 时间无关紧要的解决方案
  2. df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])
  3. for x in range(1000):
  4. d = {"Text A": 'cc', "Text B": 'ee', "Score": x}
  5. df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
  1. # 原始解决方案
  2. df = pd.DataFrame(columns=['Text A', 'Text B', 'Score'])
  3. for x in range(1000):
  4. s = {"Text A": 'cc', "Text B": 'ee', "Score": x}
  5. df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)

使用loc的解决方案速度较慢。

英文:

If values are generated in loop better is create list of dictionaries and pass to DataFrame constructor if need most efficient solution:

  1. In [171]: %%timeit
  2. ...: L = []
  3. ...: for x in range(1000):
  4. ...: d = {"Text A" : 'cc', "Text B" : 'ee', "Score" : x}
  5. ...: L.append(d)
  6. ...:
  7. ...: df = pd.DataFrame(L)
  8. ...:
  9. 1.64 ms ± 8.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  10. #Timeless solution
  11. In [172]: %%timeit
  12. ...: df = pd.DataFrame(columns=['Text A','Text B','Score'])
  13. ...:
  14. ...: for x in range(1000):
  15. ...:
  16. ...: d = {"Text A" : 'cc', "Text B" : 'ee', "Score" : x}
  17. ...: df = pd.concat([df, pd.DataFrame(d, index=[len(df)])])
  18. ...:
  19. 1.35 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  20. #original solution
  21. In [175]: %%timeit
  22. ...: df = pd.DataFrame(columns=['Text A','Text B','Score'])
  23. ...:
  24. ...: for x in range(1000):
  25. ...:
  26. ...: s = {"Text A" : 'cc', "Text B" : 'ee', "Score" : x}
  27. ...: df = pd.concat([df, pd.Series(s).to_frame().T], ignore_index=True)
  28. ...:
  29. 1.46 s ± 394 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solutions with loc are slowier.

答案2

得分: 1

I would do it this way (with concat):

  1. s = {"Text A": "【 text hhhhhh 5 】", "Text B": "【 text aaaaaa 5 】", "Score": 0}
  2. dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])

Output:

  1. print(dataframe_return)
  2. Text A Text B Score
  3. 0 text hhhhhh 5 text aaaaaa 5 0.40
  4. 1 text hhhhhh 2 text aaaaaa 2 0.23
  5. 2 text hhhhhh 3 text aaaaaa 3 0.12
  6. 3 text hhhhhh 4 text aaaaaa 4 1.40
  7. 4 text hhhhhh 5 text aaaaaa 5 0.00
  8. [1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
英文:

I would do it this way (with concat) :

  1. s = {"Text A" : "« text hhhhhh 5 »", "Text B": "« text aaaaaa 5 »" , "Score": 0}
  2. dataframe_return = pd.concat([df, pd.DataFrame(s, index=[len(df)])])


Output :

  1. print(dataframe_return)
  2. Text A Text B Score
  3. 0 « text hhhhhh » « text aaaaaa » 0.40
  4. 1 « text hhhhhh 2 » « text aaaaaa 2 » 0.23
  5. 2 « text hhhhhh 3 » « text aaaaaa 3 » 0.12
  6. 3 « text hhhhhh 4 » « text aaaaaa 4 » 1.40
  7. 4 « text hhhhhh 5 » « text aaaaaa 5 » 0.00

huangapple
  • 本文由 发表于 2023年4月13日 18:10:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定