如何计算在我的CSV文件中打印的单词“true”的次数?

huangapple go评论61阅读模式
英文:

How do I count the times the word true was printed in my csv file?

问题

生成的输出是一个包含真或假的列,我需要统计每10行中打印了多少次“真”这个词。

在我提供的代码中,它读取一个文件夹中的csv文件并将它们打印到另一个文件夹中。在这些csv文件中,包含了在数据帧定义时选择的两列。此外,通过how_many_times函数,添加了两列,用于计算值满足我给定条件的次数。

我的CSV示例(原始数据帧有更多行):

In [1]: dff = pd.DataFrame([['20220901-00:00:00', 50.0335, False, True], ['20220901-00:00:01', 50.024, False, False], ['20220901-00:00:02', 50.021, False, False]], columns=['t', 'f', 'f<49.975', 'f>50.025'])

这是我的代码(我使用了.sum,但对我所需的功能无效):

import pandas as pd   
import numpy as np       
import glob   
import os  
all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv")   

file_list = []   
for i, f in enumerate(all_files):   
    df = pd.read_csv(f, header=0, usecols=['t', 'f'])
    how_many_times1 = df.apply(lambda x: x['f'] < 49.975, axis=1).sum()
    df['f<49.975'] = how_many_times1
    how_many_times2 = df.apply(lambda x: x['f'] > 50.025, axis=1).sum()
    df['f>50.025'] = how_many_times2
    df.to_csv(f'C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(".")[0]}_ext.csv')

注意:我已经在你的代码中修正了.sum方法的调用,确保它正确计算每列的“真”值的数量。

英文:

The output that is generated is a column of true or false, I need to count how many times the word true was printed every 10 rows
In the code that I present, it reads csv files that are in one folder and prints them in another. In each of these csv's it contains two columns that were chosen when the dataframe was defined. In addition, two columns were added which, through the how_many_times function, count how many times the value meets the condition that I give it.

Example of my csv(original df has more rows):

In [1]: dff = pd.DataFrame([[&#39;20220901-00:00:00&#39;, 50.0335,False,True], [&#39;20220901-00:00:01&#39;, 50.024,False,False], [&#39;20220901-00:00:02&#39;, 50.021,False,False]], columns=[&#39;t&#39;, &#39;f&#39;,&#39;f&lt;49.975&#39;,&#39;f&gt;50.025&#39;])

This is my code (I used .sum but it didn't work for what I needed):

import pandas as pd   
import numpy as np       
import glob   
import os  
all_files = glob.glob(&quot;C:/Users/Gamer/Documents/Colbun/Saturn/*.csv&quot;)   


file_list = []   
for i,f in enumerate(all_files):   
    df = pd.read_csv(f,header=0,usecols=[&quot;t&quot;,&quot;f&quot;])
    how_many_times1= df.apply(lambda x: x[&#39;f&#39;] &lt; 49.975, axis=1).sum
    df[&#39;f&lt;49.975&#39;]=how_many_times1
    how_many_times2= df.apply(lambda x: x[&#39;f&#39;] &gt; 50.025, axis=1).sum
    df[&#39;f&gt;50.025&#39;]=how_many_times2
    df.to_csv(f&#39;C:/Users/Gamer/Documents/Colbun/Saturn2/{os.path.basename(f).split(&quot;.&quot;)[0]}_ext.csv&#39;)

答案1

得分: 0

你可以直接将.sum()方法应用于作为DataFrame列的Pandas Series (how_many_times1.sum()). 由于True等于1,False等于0,你可以直接计算条目而不需要应用条件。

这在需要总和时是有意义的。如果需要每十行周期性地进行求和,那么通过应用到列的函数中计算True值是有意义的。

下面的代码定义了两个"apply"函数,它们执行创建列的正确条目并打印每十行的总和的任务。

以下是详细操作的代码:

import pandas as pd   
df = pd.DataFrame([['20220901-00:00:00', 50.0335], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:02', 48.021 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ], 
                   ['20220901-00:00:01', 50.100 ]], 
                    columns=['t', 'f'])
columns = ['dummy', 'f<49.975','f>50.025']
print(df)

row1 = 1
sum1 = 0
lsm1 = []
def cond1(x):
    global row1, sum1
    cond = False
    if x < 49.975:
       cond=True
       sum1+=1
    if row1%10==0:
        print('sum1:', sum1, 'at row:', row1)
        lsm1.append(sum1)
        sum1=0 # outcomment if cummulative sum required
    else: 
        lsm1.append(None)
    row1 += 1
    return cond     
row2 = 1
sum2 = 0
lsm2 = []
def cond2(x):
    global row2, sum2
    cond = False
    if x > 50.025:
       cond=True
       sum2+=1
    if row2%10==0:
        print('sum2:', sum2, 'at row:', row2)
        lsm2.append(sum2)
        sum2=0 # outcomment if cummulative sum required
    else: 
        lsm2.append(None)
    row2 += 1
    return cond

how_many_times1 = df['f'].apply(cond1)
df[columns[1]]  = how_many_times1
df['sum1']      = lsm1
how_many_times2 = df['f'].apply(cond2)
df[columns[2]]  =how_many_times2
df['sum2']      = lsm2
print(df)

打印结果如下:

                    t        f
0   20220901-00:00:00  50.0335
1   20220901-00:00:01  50.1000
2   20220901-00:00:02  48.0210
3   20220901-00:00:01  50.1000
4   20220901-00:00:01  50.1000
5   20220901-00:00:02  48.0210
6   20220901-00:00:01  50.1000
7   20220901-00:00:01  50.1000
8   20220901-00:00:02  48.0210
9   20220901-00:00:01  50.1000
10  20220901-00:00:02  48.0210
11  20220901-00:00:01  50.1000
12  20220901-00:00:01  50.1000
13  20220901-00:00:01  50.1000
sum1: 3 at row: 10
sum2: 7 at row: 10
                    t        f  f<49.975  sum1  f>50.025  sum2
0   20220901-00:00:00  50.0335     False   NaN      True   NaN
1   20220901-00:00:01  50.1000     False   NaN      True   NaN
2   20220901-00:00:02  48.0210      True   NaN     False   NaN
3   20220901-00:00:01  50.1000     False   NaN      True   NaN
4   20220901-00:00:01  50.1000     False   NaN      True   NaN
5   20220901-00:00:02  48.0210      True   NaN     False   NaN
6   20220901-00:00:01  50.1000     False   NaN      True   NaN
7   20220901-00:00:01  50.1000     False   NaN      True   NaN
8   20220901-00:00:02  48.0210      True   NaN     False   NaN
9   20220901-00:00:01  50.1000     False   3.0      True   7.0
10  20220901-00:00:02  48.0210      True   NaN     False   NaN
11  20220901-00:00:01  50.1000     False   NaN      True   NaN
12  20220901-00:00:01  50.1000     False   NaN      True   NaN
13  20220901-00:00:01  50.1000     False   NaN      True   NaN
英文:

You apply the .sum() method directly to a Pandas Series being a DataFrame column ( how_many_times1.sum() ). And because True is equivalent to 1 and False to 0 you can directly count the entries without applying a condition.
This makes sense in case you need the total sum. In case you need the sum periodically each ten rows it makes sense to count the True values in the to the column applied function.
The code below defines two 'apply' functions which do the job of creating the right entries for the columns and printing the sum each ten rows.

See the code below for how it is done in detail:

import pandas as pd   
df = pd.DataFrame([[&#39;20220901-00:00:00&#39;, 50.0335], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:02&#39;, 48.021 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:02&#39;, 48.021 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:02&#39;, 48.021 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:02&#39;, 48.021 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ], 
[&#39;20220901-00:00:01&#39;, 50.100 ]], 
columns=[&#39;t&#39;,          &#39;f&#39;  ])
columns = [&#39;dummy&#39;, &#39;f&lt;49.975&#39;,&#39;f&gt;50.025&#39;]
print(df)
row1 = 1
sum1 = 0
lsm1 = []
def cond1(x):
global row1, sum1
cond = False
if x &lt; 49.975:
cond=True
sum1+=1
if row1%10==0:
print(&#39;sum1:&#39;, sum1, &#39;at row:&#39;, row1)
lsm1.append(sum1)
sum1=0 # outcomment if cummulative sum required
else: 
lsm1.append(None)
row1 += 1
return cond     
row2 = 1
sum2 = 0
lsm2 = []
def cond2(x):
global row2, sum2
cond = False
if x &gt; 50.025:
cond=True
sum2+=1
if row2%10==0:
print(&#39;sum2:&#39;, sum2, &#39;at row:&#39;, row2)
lsm2.append(sum2)
sum2=0 # outcomment if cummulative sum required
else: 
lsm2.append(None)
row2 += 1
return cond
how_many_times1 = df[&#39;f&#39;].apply(cond1)
df[columns[1]]  = how_many_times1
df[&#39;sum1&#39;]      = lsm1
how_many_times2 = df[&#39;f&#39;].apply(cond2)
df[columns[2]]  =how_many_times2
df[&#39;sum2&#39;]      = lsm2
print(df)

prints

                    t        f
0   20220901-00:00:00  50.0335
1   20220901-00:00:01  50.1000
2   20220901-00:00:02  48.0210
3   20220901-00:00:01  50.1000
4   20220901-00:00:01  50.1000
5   20220901-00:00:02  48.0210
6   20220901-00:00:01  50.1000
7   20220901-00:00:01  50.1000
8   20220901-00:00:02  48.0210
9   20220901-00:00:01  50.1000
10  20220901-00:00:02  48.0210
11  20220901-00:00:01  50.1000
12  20220901-00:00:01  50.1000
13  20220901-00:00:01  50.1000
sum1: 3 at row: 10
sum2: 7 at row: 10
t        f  f&lt;49.975  sum1  f&gt;50.025  sum2
0   20220901-00:00:00  50.0335     False   NaN      True   NaN
1   20220901-00:00:01  50.1000     False   NaN      True   NaN
2   20220901-00:00:02  48.0210      True   NaN     False   NaN
3   20220901-00:00:01  50.1000     False   NaN      True   NaN
4   20220901-00:00:01  50.1000     False   NaN      True   NaN
5   20220901-00:00:02  48.0210      True   NaN     False   NaN
6   20220901-00:00:01  50.1000     False   NaN      True   NaN
7   20220901-00:00:01  50.1000     False   NaN      True   NaN
8   20220901-00:00:02  48.0210      True   NaN     False   NaN
9   20220901-00:00:01  50.1000     False   3.0      True   7.0
10  20220901-00:00:02  48.0210      True   NaN     False   NaN
11  20220901-00:00:01  50.1000     False   NaN      True   NaN
12  20220901-00:00:01  50.1000     False   NaN      True   NaN
13  20220901-00:00:01  50.1000     False   NaN      True   NaN

答案2

得分: 0

另一个选项可以是:

df[["f<49.975", "f>50.025"]] = (
    df.assign(f1=df["f"] < 49.975, f2=df["f"] > 50.025)
    .groupby(df.index // 10)[["f1", "f2"]].transform("sum")
    .loc[df.index % 10 == 9]
)
  • 添加两列 f1f2df,根据这两个条件定义列值。
  • 现在将每10行分组,并对这两个新列进行求和,以获得每个块的真值计数。使用 .transform 来保留原始索引。
  • 然后仅选择每十行,将结果分配给这两个新列。
英文:

Another option would be:

df[[&quot;f&lt;49.975&quot;, &quot;f&gt;50.025&quot;]] = (
    df.assign(f1=df[&quot;f&quot;].lt(49.975), f2=df[&quot;f&quot;].gt(50.025))
    .groupby(df.index // 10)[[&quot;f1&quot;, &quot;f2&quot;]].transform(&quot;sum&quot;)
    .loc[df.index % 10 == 9]
)
  • Add two columns f1, f2 to df, defined by the two conditions.
  • Now group every 10 rows and sum over the two new columns to get the truth-count per block. Use .transform to do that to keep the original index.
  • Then take only every tenth row and assign the result to the two new columns.

huangapple
  • 本文由 发表于 2023年2月6日 09:09:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75356581.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定