如何计算在我的CSV文件中打印的单词“true”的次数?

huangapple go评论75阅读模式
英文:

How do I count the times the word true was printed in my csv file?

问题

生成的输出是一个包含真或假的列,我需要统计每10行中打印了多少次“真”这个词。

在我提供的代码中,它读取一个文件夹中的csv文件并将它们打印到另一个文件夹中。在这些csv文件中,包含了在数据帧定义时选择的两列。此外,通过how_many_times函数,添加了两列,用于计算值满足我给定条件的次数。

我的CSV示例(原始数据帧有更多行):

  1. In [1]: dff = pd.DataFrame([['20220901-00:00:00', 50.0335, False, True], ['20220901-00:00:01', 50.024, False, False], ['20220901-00:00:02', 50.021, False, False]], columns=['t', 'f', 'f<49.975', 'f>50.025'])

这是我的代码(我使用了.sum,但对我所需的功能无效):

  1. import pandas as pd
  2. import numpy as np
  3. import glob
  4. import os
  5. all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv")
  6. file_list = []
  7. for i, f in enumerate(all_files):
  8. df = pd.read_csv(f, header=0, usecols=['t', 'f'])
  9. how_many_times1 = df.apply(lambda x: x['f'] < 49.975, axis=1).sum()
  10. df['f<49.975'] = how_many_times1
  11. how_many_times2 = df.apply(lambda x: x['f'] > 50.025, axis=1).sum()
  12. df['f>50.025'] = how_many_times2
  13. df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(".")[0]}_ext.csv')

注意:我已经在你的代码中修正了.sum方法的调用,确保它正确计算每列的“真”值的数量。

英文:

The output that is generated is a column of true or false, I need to count how many times the word true was printed every 10 rows
In the code that I present, it reads csv files that are in one folder and prints them in another. In each of these csv's it contains two columns that were chosen when the dataframe was defined. In addition, two columns were added which, through the how_many_times function, count how many times the value meets the condition that I give it.

Example of my csv(original df has more rows):

  1. In [1]: dff = pd.DataFrame([[&#39;20220901-00:00:00&#39;, 50.0335,False,True], [&#39;20220901-00:00:01&#39;, 50.024,False,False], [&#39;20220901-00:00:02&#39;, 50.021,False,False]], columns=[&#39;t&#39;, &#39;f&#39;,&#39;f&lt;49.975&#39;,&#39;f&gt;50.025&#39;])

This is my code (I used .sum but it didn't work for what I needed):

  1. import pandas as pd
  2. import numpy as np
  3. import glob
  4. import os
  5. all_files = glob.glob(&quot;C:/Users/Gamer/Documents/Colbun/Saturn/*.csv&quot;)
  6. file_list = []
  7. for i,f in enumerate(all_files):
  8. df = pd.read_csv(f,header=0,usecols=[&quot;t&quot;,&quot;f&quot;])
  9. how_many_times1= df.apply(lambda x: x[&#39;f&#39;] &lt; 49.975, axis=1).sum
  10. df[&#39;f&lt;49.975&#39;]=how_many_times1
  11. how_many_times2= df.apply(lambda x: x[&#39;f&#39;] &gt; 50.025, axis=1).sum
  12. df[&#39;f&gt;50.025&#39;]=how_many_times2
  13. df.to_csv(f&#39;C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(&quot;.&quot;)[0]}_ext.csv&#39;)

答案1

得分: 0

你可以直接将.sum()方法应用于作为DataFrame列的Pandas Series (how_many_times1.sum()). 由于True等于1,False等于0,你可以直接计算条目而不需要应用条件。

这在需要总和时是有意义的。如果需要每十行周期性地进行求和,那么通过应用到列的函数中计算True值是有意义的。

下面的代码定义了两个"apply"函数,它们执行创建列的正确条目并打印每十行的总和的任务。

以下是详细操作的代码:

  1. import pandas as pd
  2. df = pd.DataFrame([['20220901-00:00:00', 50.0335],
  3. ['20220901-00:00:01', 50.100 ],
  4. ['20220901-00:00:02', 48.021 ],
  5. ['20220901-00:00:01', 50.100 ],
  6. ['20220901-00:00:01', 50.100 ],
  7. ['20220901-00:00:02', 48.021 ],
  8. ['20220901-00:00:01', 50.100 ],
  9. ['20220901-00:00:01', 50.100 ],
  10. ['20220901-00:00:02', 48.021 ],
  11. ['20220901-00:00:01', 50.100 ],
  12. ['20220901-00:00:02', 48.021 ],
  13. ['20220901-00:00:01', 50.100 ],
  14. ['20220901-00:00:01', 50.100 ],
  15. ['20220901-00:00:01', 50.100 ]],
  16. columns=['t', 'f'])
  17. columns = ['dummy', 'f<49.975','f>50.025']
  18. print(df)
  19. row1 = 1
  20. sum1 = 0
  21. lsm1 = []
  22. def cond1(x):
  23. global row1, sum1
  24. cond = False
  25. if x < 49.975:
  26. cond=True
  27. sum1+=1
  28. if row1%10==0:
  29. print('sum1:', sum1, 'at row:', row1)
  30. lsm1.append(sum1)
  31. sum1=0 # outcomment if cummulative sum required
  32. else:
  33. lsm1.append(None)
  34. row1 += 1
  35. return cond
  36. row2 = 1
  37. sum2 = 0
  38. lsm2 = []
  39. def cond2(x):
  40. global row2, sum2
  41. cond = False
  42. if x > 50.025:
  43. cond=True
  44. sum2+=1
  45. if row2%10==0:
  46. print('sum2:', sum2, 'at row:', row2)
  47. lsm2.append(sum2)
  48. sum2=0 # outcomment if cummulative sum required
  49. else:
  50. lsm2.append(None)
  51. row2 += 1
  52. return cond
  53. how_many_times1 = df['f'].apply(cond1)
  54. df[columns[1]] = how_many_times1
  55. df['sum1'] = lsm1
  56. how_many_times2 = df['f'].apply(cond2)
  57. df[columns[2]] =how_many_times2
  58. df['sum2'] = lsm2
  59. print(df)

打印结果如下:

  1. t f
  2. 0 20220901-00:00:00 50.0335
  3. 1 20220901-00:00:01 50.1000
  4. 2 20220901-00:00:02 48.0210
  5. 3 20220901-00:00:01 50.1000
  6. 4 20220901-00:00:01 50.1000
  7. 5 20220901-00:00:02 48.0210
  8. 6 20220901-00:00:01 50.1000
  9. 7 20220901-00:00:01 50.1000
  10. 8 20220901-00:00:02 48.0210
  11. 9 20220901-00:00:01 50.1000
  12. 10 20220901-00:00:02 48.0210
  13. 11 20220901-00:00:01 50.1000
  14. 12 20220901-00:00:01 50.1000
  15. 13 20220901-00:00:01 50.1000
  16. sum1: 3 at row: 10
  17. sum2: 7 at row: 10
  18. t f f<49.975 sum1 f>50.025 sum2
  19. 0 20220901-00:00:00 50.0335 False NaN True NaN
  20. 1 20220901-00:00:01 50.1000 False NaN True NaN
  21. 2 20220901-00:00:02 48.0210 True NaN False NaN
  22. 3 20220901-00:00:01 50.1000 False NaN True NaN
  23. 4 20220901-00:00:01 50.1000 False NaN True NaN
  24. 5 20220901-00:00:02 48.0210 True NaN False NaN
  25. 6 20220901-00:00:01 50.1000 False NaN True NaN
  26. 7 20220901-00:00:01 50.1000 False NaN True NaN
  27. 8 20220901-00:00:02 48.0210 True NaN False NaN
  28. 9 20220901-00:00:01 50.1000 False 3.0 True 7.0
  29. 10 20220901-00:00:02 48.0210 True NaN False NaN
  30. 11 20220901-00:00:01 50.1000 False NaN True NaN
  31. 12 20220901-00:00:01 50.1000 False NaN True NaN
  32. 13 20220901-00:00:01 50.1000 False NaN True NaN
英文:

You apply the .sum() method directly to a Pandas Series being a DataFrame column ( how_many_times1.sum() ). And because True is equivalent to 1 and False to 0 you can directly count the entries without applying a condition.
This makes sense in case you need the total sum. In case you need the sum periodically each ten rows it makes sense to count the True values in the to the column applied function.
The code below defines two 'apply' functions which do the job of creating the right entries for the columns and printing the sum each ten rows.

See the code below for how it is done in detail:

  1. import pandas as pd
  2. df = pd.DataFrame([[&#39;20220901-00:00:00&#39;, 50.0335],
  3. [&#39;20220901-00:00:01&#39;, 50.100 ],
  4. [&#39;20220901-00:00:02&#39;, 48.021 ],
  5. [&#39;20220901-00:00:01&#39;, 50.100 ],
  6. [&#39;20220901-00:00:01&#39;, 50.100 ],
  7. [&#39;20220901-00:00:02&#39;, 48.021 ],
  8. [&#39;20220901-00:00:01&#39;, 50.100 ],
  9. [&#39;20220901-00:00:01&#39;, 50.100 ],
  10. [&#39;20220901-00:00:02&#39;, 48.021 ],
  11. [&#39;20220901-00:00:01&#39;, 50.100 ],
  12. [&#39;20220901-00:00:02&#39;, 48.021 ],
  13. [&#39;20220901-00:00:01&#39;, 50.100 ],
  14. [&#39;20220901-00:00:01&#39;, 50.100 ],
  15. [&#39;20220901-00:00:01&#39;, 50.100 ]],
  16. columns=[&#39;t&#39;, &#39;f&#39; ])
  17. columns = [&#39;dummy&#39;, &#39;f&lt;49.975&#39;,&#39;f&gt;50.025&#39;]
  18. print(df)
  19. row1 = 1
  20. sum1 = 0
  21. lsm1 = []
  22. def cond1(x):
  23. global row1, sum1
  24. cond = False
  25. if x &lt; 49.975:
  26. cond=True
  27. sum1+=1
  28. if row1%10==0:
  29. print(&#39;sum1:&#39;, sum1, &#39;at row:&#39;, row1)
  30. lsm1.append(sum1)
  31. sum1=0 # outcomment if cummulative sum required
  32. else:
  33. lsm1.append(None)
  34. row1 += 1
  35. return cond
  36. row2 = 1
  37. sum2 = 0
  38. lsm2 = []
  39. def cond2(x):
  40. global row2, sum2
  41. cond = False
  42. if x &gt; 50.025:
  43. cond=True
  44. sum2+=1
  45. if row2%10==0:
  46. print(&#39;sum2:&#39;, sum2, &#39;at row:&#39;, row2)
  47. lsm2.append(sum2)
  48. sum2=0 # outcomment if cummulative sum required
  49. else:
  50. lsm2.append(None)
  51. row2 += 1
  52. return cond
  53. how_many_times1 = df[&#39;f&#39;].apply(cond1)
  54. df[columns[1]] = how_many_times1
  55. df[&#39;sum1&#39;] = lsm1
  56. how_many_times2 = df[&#39;f&#39;].apply(cond2)
  57. df[columns[2]] =how_many_times2
  58. df[&#39;sum2&#39;] = lsm2
  59. print(df)

prints

  1. t f
  2. 0 20220901-00:00:00 50.0335
  3. 1 20220901-00:00:01 50.1000
  4. 2 20220901-00:00:02 48.0210
  5. 3 20220901-00:00:01 50.1000
  6. 4 20220901-00:00:01 50.1000
  7. 5 20220901-00:00:02 48.0210
  8. 6 20220901-00:00:01 50.1000
  9. 7 20220901-00:00:01 50.1000
  10. 8 20220901-00:00:02 48.0210
  11. 9 20220901-00:00:01 50.1000
  12. 10 20220901-00:00:02 48.0210
  13. 11 20220901-00:00:01 50.1000
  14. 12 20220901-00:00:01 50.1000
  15. 13 20220901-00:00:01 50.1000
  16. sum1: 3 at row: 10
  17. sum2: 7 at row: 10
  18. t f f&lt;49.975 sum1 f&gt;50.025 sum2
  19. 0 20220901-00:00:00 50.0335 False NaN True NaN
  20. 1 20220901-00:00:01 50.1000 False NaN True NaN
  21. 2 20220901-00:00:02 48.0210 True NaN False NaN
  22. 3 20220901-00:00:01 50.1000 False NaN True NaN
  23. 4 20220901-00:00:01 50.1000 False NaN True NaN
  24. 5 20220901-00:00:02 48.0210 True NaN False NaN
  25. 6 20220901-00:00:01 50.1000 False NaN True NaN
  26. 7 20220901-00:00:01 50.1000 False NaN True NaN
  27. 8 20220901-00:00:02 48.0210 True NaN False NaN
  28. 9 20220901-00:00:01 50.1000 False 3.0 True 7.0
  29. 10 20220901-00:00:02 48.0210 True NaN False NaN
  30. 11 20220901-00:00:01 50.1000 False NaN True NaN
  31. 12 20220901-00:00:01 50.1000 False NaN True NaN
  32. 13 20220901-00:00:01 50.1000 False NaN True NaN

答案2

得分: 0

另一个选项可以是:

  1. df[["f<49.975", "f>50.025"]] = (
  2. df.assign(f1=df["f"] < 49.975, f2=df["f"] > 50.025)
  3. .groupby(df.index // 10)[["f1", "f2"]].transform("sum")
  4. .loc[df.index % 10 == 9]
  5. )
  • 添加两列 f1f2df,根据这两个条件定义列值。
  • 现在将每10行分组,并对这两个新列进行求和,以获得每个块的真值计数。使用 .transform 来保留原始索引。
  • 然后仅选择每十行,将结果分配给这两个新列。
英文:

Another option would be:

  1. df[[&quot;f&lt;49.975&quot;, &quot;f&gt;50.025&quot;]] = (
  2. df.assign(f1=df[&quot;f&quot;].lt(49.975), f2=df[&quot;f&quot;].gt(50.025))
  3. .groupby(df.index // 10)[[&quot;f1&quot;, &quot;f2&quot;]].transform(&quot;sum&quot;)
  4. .loc[df.index % 10 == 9]
  5. )
  • Add two columns f1, f2 to df, defined by the two conditions.
  • Now group every 10 rows and sum over the two new columns to get the truth-count per block. Use .transform to do that to keep the original index.
  • Then take only every tenth row and assign the result to the two new columns.

huangapple
  • 本文由 发表于 2023年2月6日 09:09:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356581.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定