根据N列的值计算行数。

huangapple go评论65阅读模式
英文:

Counting the number of rows based on the value of N number of columns

问题

我有一个类似这样的数据集:

Col1    Col2    Col3
A        100     100
A         0       0
A         0      100
B        100      0
C        100     100
C        100     100

我想要统计基于A BC中出现100(或任何其他大于零的值)的行数,结果如下:

    Col2_counts   Col3_counts
A       1           2
B       1           0
C       2           2

这样我就可以计算Col2Col3A B C的总百分比等等。

我尝试过使用df.groupby(['Col1', 'Col2', 'Col3']).transform('count'),但它没有给我想要的结果。

英文:

I have a dataset that looks like this:

Col1    Col2    Col3
A        100     100
A         0       0
A         0      100
B        100      0
C        100     100
C        100     100

I want to count the number of rows with 100 (or any other values greater than zero) based on A B and C

which will result to this:

    Col2_counts   Col3_counts
A       1           2
B       1           0
C       2           2

so I can calculate the total percentage of A B C in Col2 and Col3 etc.

I tried df.groupby(['Col1', 'Col 2', 'Col3']).transform ('count'), but it doesn't give me the desired result.

答案1

得分: 2

df.set_index('Col1').gt(0).groupby(level=0).sum()

output:

Col2    Col3

Col1
A 1 2
B 1 0
C 2 2

英文:
df.set_index('Col1').gt(0).groupby(level=0).sum()

output:

	    Col2	Col3
Col1		
A     	1	    2
B	    1	    0
C	    2	    2

答案2

得分: 0

以下是已翻译的内容:

(
    df
    .set_index("Col1")
    .eq(100)
    .groupby("Col1")
    .sum()
    .add_suffix("_counts")
)

如chrslg指出,Panda Kim的答案评估行数> 0而不是行数== 100。此外,您可以在末尾添加"_counts"后缀以获取您示例中的列名。

以下是每个操作的逐步解释:

  1. 将"Col1"设置为DataFrame的索引。

  2. 评估单元格是否等于100。

    	  Col2	  Col3
    Col1		
    A	  True	  True
    A	  False	  False
    A	  False	  True
    B	  True	  False
    C	  True	  True
    C	  True	  True
    
  3. 按"Col1"索引分组并计算"True"值的数量。使用"sum"操作有效,因为它将"True"转换为1,"False"转换为0。

  4. 对所有列添加"_counts"后缀。

    	  Col2_counts	Col3_counts
    Col1		
    A	      1	             2
    B	      1	             0
    C	      2	             2
    
英文:

The following block produces your expected output:

(
    df
    .set_index("Col1")
    .eq(100)
    .groupby("Col1")
    .sum()
    .add_suffix("_counts")
)

As chrslg pointed out, Panda Kim's answer evaluates rows > 0 instead of rows == 100. Also, you can add the "_counts" suffix at the end to get the column names from your example.

Here's a the step-by-step explanation of each operation:

  1. Set "Col1" as the DataFrame's index.

  2. Evaluate if cells are equal (eq) to 100.

    	  Col2	  Col3
    Col1		
    A	  True	  True
    A	  False	  False
    A	  False	  True
    B	  True	  False
    C	  True	  True
    C	  True	  True
    
  3. Group by the "Col1" index and count True values. Using the sum operation works because it converts True to 1 and False to 0.

  4. Add a "_counts" suffix to all columns.

    	  Col2_counts	Col3_counts
    Col1		
    A	      1	             2
    B	      1	             0
    C	      2	             2
    

huangapple
  • 本文由 发表于 2023年6月26日 11:05:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76553296.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定