如何在Python中迭代遍历数据框每个概率列,并找到概率低于50%的行号。

huangapple go评论68阅读模式
英文:

How to iterate through each probability column of a dataframe and find the row number from where the probability goes below 50% in python

问题

I have a dataframe which is the output of predict_survival_function() from cox propotional model, which gives the probability of survival of customers for every month (index row number) and in columns we have the customers. So I would like to get the month number for each customers when the probability goes below 0.50. Below is the screenshot of the table.

我有一个数据框,它是来自Cox比例模型的predict_survival_function()的输出,它提供了每个月(索引行号)客户存活的概率,列中包括客户。因此,我想要获取每个客户的月份编号,当概率降至0.50以下时。下面是表格的截图。

The output I am looking for is something like below -
Suppose for first customer No. 4 (First column in the dataframe) If the probability goes less than 0.50 at row number 55. Then the output should be

我寻找的输出类似于下面的内容 -
假设对于第一个客户编号4(数据框中的第一列),如果概率在行号55处小于0.50。那么输出应该是

英文:

I have a dataframe which is the output of predict_survival_function() from cox propotional model, which gives the probability of survival of customers for every month (index row number) and in columns we have the customers. So I would like to get the month number for each customers when the probability goes below 0.50. Below is the screenshot of the table.

如何在Python中迭代遍历数据框每个概率列,并找到概率低于50%的行号。

The output I am looking for is something like below -
Suppose for first customer No. 4 (First column in the dataframe) If the probability goes less than 0.50 at row number 55. Then the output should be

如何在Python中迭代遍历数据框每个概率列,并找到概率低于50%的行号。

And similar for all the other columns in the data frame.
Any help is appreciated

答案1

得分: 2

IIUC,您可以使用 idxmax

(df <= 0.5).idxmax()

完整输出:

out = ((df <= 0.5).idxmax().rename_axis('Customer Number')
                  .rename('Row Number').reset_index())
print(out)

# 输出
   Customer Number  Row Number
0                4           4
1                5           6
2                7           7
英文:

Suppose the following dataframe:

&gt;&gt;&gt; df
          4         5         7
0  0.974789  0.976546  0.913151
1  0.918408  0.815823  0.909577
2  0.748928  0.801727  0.856562
3  0.691171  0.791815  0.794988
4  0.442441  0.669530  0.750395  # Customer 4, first value below 0.5 -&gt; row 4
5  0.378585  0.568831  0.561721
6  0.285419  0.287814  0.521966  # Customer 5, first value below 0.5 -&gt; row 6
7  0.240335  0.216207  0.176980  # Customer 7, first value below 0.5 -&gt; row 7
8  0.191656  0.095793  0.118300
9  0.183290  0.087297  0.035063

IIUC, you can use idxmax:

&gt;&gt;&gt; (df &lt;= 0.5).idxmax()
4    4
5    6
7    7
dtype: int64

Full output:

out = ((df &lt;= 0.5).idxmax().rename_axis(&#39;Customer Number&#39;)
                  .rename(&#39;Row Number&#39;).reset_index())
print(out)

# Output
   Customer Number  Row Number
0                4           4
1                5           6
2                7           7

huangapple
  • 本文由 发表于 2023年4月4日 17:55:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定