英文:
How to iterate through each probability column of a dataframe and find the row number from where the probability goes below 50% in python
问题
I have a dataframe which is the output of predict_survival_function() from cox propotional model, which gives the probability of survival of customers for every month (index row number) and in columns we have the customers. So I would like to get the month number for each customers when the probability goes below 0.50. Below is the screenshot of the table.
我有一个数据框,它是来自Cox比例模型的predict_survival_function()的输出,它提供了每个月(索引行号)客户存活的概率,列中包括客户。因此,我想要获取每个客户的月份编号,当概率降至0.50以下时。下面是表格的截图。
The output I am looking for is something like below -
Suppose for first customer No. 4 (First column in the dataframe) If the probability goes less than 0.50 at row number 55. Then the output should be
我寻找的输出类似于下面的内容 -
假设对于第一个客户编号4(数据框中的第一列),如果概率在行号55处小于0.50。那么输出应该是
英文:
I have a dataframe which is the output of predict_survival_function() from cox propotional model, which gives the probability of survival of customers for every month (index row number) and in columns we have the customers. So I would like to get the month number for each customers when the probability goes below 0.50. Below is the screenshot of the table.
The output I am looking for is something like below -
Suppose for first customer No. 4 (First column in the dataframe) If the probability goes less than 0.50 at row number 55. Then the output should be
And similar for all the other columns in the data frame.
Any help is appreciated
答案1
得分: 2
IIUC,您可以使用 idxmax
:
(df <= 0.5).idxmax()
完整输出:
out = ((df <= 0.5).idxmax().rename_axis('Customer Number')
.rename('Row Number').reset_index())
print(out)
# 输出
Customer Number Row Number
0 4 4
1 5 6
2 7 7
英文:
Suppose the following dataframe:
>>> df
4 5 7
0 0.974789 0.976546 0.913151
1 0.918408 0.815823 0.909577
2 0.748928 0.801727 0.856562
3 0.691171 0.791815 0.794988
4 0.442441 0.669530 0.750395 # Customer 4, first value below 0.5 -> row 4
5 0.378585 0.568831 0.561721
6 0.285419 0.287814 0.521966 # Customer 5, first value below 0.5 -> row 6
7 0.240335 0.216207 0.176980 # Customer 7, first value below 0.5 -> row 7
8 0.191656 0.095793 0.118300
9 0.183290 0.087297 0.035063
IIUC, you can use idxmax
:
>>> (df <= 0.5).idxmax()
4 4
5 6
7 7
dtype: int64
Full output:
out = ((df <= 0.5).idxmax().rename_axis('Customer Number')
.rename('Row Number').reset_index())
print(out)
# Output
Customer Number Row Number
0 4 4
1 5 6
2 7 7
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论