每行最后一个非零值后面的零的个数。

huangapple go评论74阅读模式
英文:

Count number of zeros after last non-zero value per row

问题

我想要获取每行最后一个非零值后面的零的数量。因此,输出应如下所示:

索引 数量
One 2
two 1
three 0
英文:

I have the following df:

index jan feb marc april
One 1 7 0 0
two 0 8 7 0
three 0 0 0 1

I'd like to get the number of zeros after the last non-zero value per row. So the output should look like

index num
One 2
two 1
three 0

答案1

得分: 4

import pandas as pd

# 数据帧
data = {
    'jan': [1, 0, 0],
    'feb': [7, 8, 0],
    'marc': [0, 7, 0],
    'april': [0, 0, 1]
}
df = pd.DataFrame(data, index=['One', 'two', 'three'])

# 计算每个索引/行中最后一个非零值后面的零的数量
num_zeros = df.ne(0).iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)

# 结果数据帧
result = pd.DataFrame({'num': num_zeros}, index=df.index)

print("数据帧:")
print(df)
print("\n结果:")
print(result)

打印:

数据帧:
       jan  feb  marc  april
One      1    7     0      0
two      0    8     7      0
three    0    0     0      1

结果:
       num
One      2
two      1
three    0
英文:
import pandas as pd

# Dataframe
data = {
    'jan': [1, 0, 0],
    'feb': [7, 8, 0],
    'marc': [0, 7, 0],
    'april': [0, 0, 1]
}
df = pd.DataFrame(data, index=['One', 'two', 'three'])

# Calculate the number of zeros after the last non-zero value per index/row
num_zeros = df.ne(0).iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)

# Result dataframe
result = pd.DataFrame({'num': num_zeros}, index=df.index)

print("DataFrame:")
print(df)
print("\nResult:")
print(result)

Calculating the # of zeroes (df.ne(0).iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)):

  • ne method to check inequality with zero (df.ne(0))
  • perform cumulative sum along the columns in reverse order (iloc[:, ::-1].cumsum(axis=1)) to get a binary representation of
    non-zero values after the last non-zero value
  • check where the cumulative product equals zero (eq(0)) and sum along the rows (sum(axis=1)) to get the count

Prints:

DataFrame:
       jan  feb  marc  april
One      1    7     0      0
two      0    8     7      0
three    0    0     0      1

Result:
       num
One      2
two      1
three    0

答案2

得分: 3

使用类似于@BrJ的逻辑,但在我看来更加简单明了。

使用逆转的cummin将在False之前的所有True设置为False,然后使用sum

out = (df.loc[:,::-1].eq(0)
         .cummin(axis=1).sum(axis=1)
         .to_frame('num')
       )

输出:

       num
One      2
two      1
three    0

中间步骤:

# 布尔掩码(0表示True)
         jan    feb   marc  april
One    False  False   True   True
two     True  False  False   True
three   True   True   True  False

# 经过逆转的cummin(再次逆转以清晰显示)
# 所有在False之前的True现在都是False
         jan    feb   marc  april
One    False  False   True   True
two    False  False  False   True
three  False  False  False  False
英文:

Similar logic to that of @BrJ but more straightforward in my opinion.

Using a reversed cummin to set to False all True preceding a False, then sum:

out = (df.loc[:,::-1].eq(0)
         .cummin(axis=1).sum(axis=1)
         .to_frame('num')
       )

Output:

       num
One      2
two      1
three    0

Intermediates:

# boolean mask (0s are True)
         jan    feb   marc  april
One    False  False   True   True
two     True  False  False   True
three   True   True   True  False

# after reversed cummin (and reversed again for clarity)
# all True that preceded a False are now False
         jan    feb   marc  april
One    False  False   True   True
two    False  False  False   True
three  False  False  False  False

huangapple
  • 本文由 发表于 2023年5月28日 02:19:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76348393.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定