英文:
Count number of zeros after last non-zero value per row
问题
我想要获取每行最后一个非零值后面的零的数量。因此,输出应如下所示:
索引 | 数量 |
---|---|
One | 2 |
two | 1 |
three | 0 |
英文:
I have the following df:
index | jan | feb | marc | april |
---|---|---|---|---|
One | 1 | 7 | 0 | 0 |
two | 0 | 8 | 7 | 0 |
three | 0 | 0 | 0 | 1 |
I'd like to get the number of zeros after the last non-zero value per row. So the output should look like
index | num |
---|---|
One | 2 |
two | 1 |
three | 0 |
答案1
得分: 4
import pandas as pd
# 数据帧
data = {
'jan': [1, 0, 0],
'feb': [7, 8, 0],
'marc': [0, 7, 0],
'april': [0, 0, 1]
}
df = pd.DataFrame(data, index=['One', 'two', 'three'])
# 计算每个索引/行中最后一个非零值后面的零的数量
num_zeros = df.ne(0).iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)
# 结果数据帧
result = pd.DataFrame({'num': num_zeros}, index=df.index)
print("数据帧:")
print(df)
print("\n结果:")
print(result)
打印:
数据帧:
jan feb marc april
One 1 7 0 0
two 0 8 7 0
three 0 0 0 1
结果:
num
One 2
two 1
three 0
英文:
import pandas as pd
# Dataframe
data = {
'jan': [1, 0, 0],
'feb': [7, 8, 0],
'marc': [0, 7, 0],
'april': [0, 0, 1]
}
df = pd.DataFrame(data, index=['One', 'two', 'three'])
# Calculate the number of zeros after the last non-zero value per index/row
num_zeros = df.ne(0).iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)
# Result dataframe
result = pd.DataFrame({'num': num_zeros}, index=df.index)
print("DataFrame:")
print(df)
print("\nResult:")
print(result)
Calculating the # of zeroes (df.ne(0).iloc[:, ::-1].cumsum(axis=1).eq(0).sum(axis=1)):
- ne method to check inequality with zero (df.ne(0))
- perform cumulative sum along the columns in reverse order (iloc[:, ::-1].cumsum(axis=1)) to get a binary representation of
non-zero values after the last non-zero value - check where the cumulative product equals zero (eq(0)) and sum along the rows (sum(axis=1)) to get the count
Prints:
DataFrame:
jan feb marc april
One 1 7 0 0
two 0 8 7 0
three 0 0 0 1
Result:
num
One 2
two 1
three 0
答案2
得分: 3
使用类似于@BrJ的逻辑,但在我看来更加简单明了。
使用逆转的cummin
将在False
之前的所有True
设置为False
,然后使用sum
:
out = (df.loc[:,::-1].eq(0)
.cummin(axis=1).sum(axis=1)
.to_frame('num')
)
输出:
num
One 2
two 1
three 0
中间步骤:
# 布尔掩码(0表示True)
jan feb marc april
One False False True True
two True False False True
three True True True False
# 经过逆转的cummin(再次逆转以清晰显示)
# 所有在False之前的True现在都是False
jan feb marc april
One False False True True
two False False False True
three False False False False
英文:
Similar logic to that of @BrJ but more straightforward in my opinion.
Using a reversed cummin
to set to False
all True
preceding a False
, then sum
:
out = (df.loc[:,::-1].eq(0)
.cummin(axis=1).sum(axis=1)
.to_frame('num')
)
Output:
num
One 2
two 1
three 0
Intermediates:
# boolean mask (0s are True)
jan feb marc april
One False False True True
two True False False True
three True True True False
# after reversed cummin (and reversed again for clarity)
# all True that preceded a False are now False
jan feb marc april
One False False True True
two False False False True
three False False False False
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论