如何在Python中迭代遍历数据框每个概率列,并找到概率低于50%的行号。

huangapple go评论96阅读模式
英文:

How to iterate through each probability column of a dataframe and find the row number from where the probability goes below 50% in python

问题

I have a dataframe which is the output of predict_survival_function() from cox propotional model, which gives the probability of survival of customers for every month (index row number) and in columns we have the customers. So I would like to get the month number for each customers when the probability goes below 0.50. Below is the screenshot of the table.

我有一个数据框,它是来自Cox比例模型的predict_survival_function()的输出,它提供了每个月(索引行号)客户存活的概率,列中包括客户。因此,我想要获取每个客户的月份编号,当概率降至0.50以下时。下面是表格的截图。

The output I am looking for is something like below -
Suppose for first customer No. 4 (First column in the dataframe) If the probability goes less than 0.50 at row number 55. Then the output should be

我寻找的输出类似于下面的内容 -
假设对于第一个客户编号4(数据框中的第一列),如果概率在行号55处小于0.50。那么输出应该是

英文:

I have a dataframe which is the output of predict_survival_function() from cox propotional model, which gives the probability of survival of customers for every month (index row number) and in columns we have the customers. So I would like to get the month number for each customers when the probability goes below 0.50. Below is the screenshot of the table.

如何在Python中迭代遍历数据框每个概率列,并找到概率低于50%的行号。

The output I am looking for is something like below -
Suppose for first customer No. 4 (First column in the dataframe) If the probability goes less than 0.50 at row number 55. Then the output should be

如何在Python中迭代遍历数据框每个概率列,并找到概率低于50%的行号。

And similar for all the other columns in the data frame.
Any help is appreciated

答案1

得分: 2

IIUC,您可以使用 idxmax

  1. (df <= 0.5).idxmax()

完整输出:

  1. out = ((df <= 0.5).idxmax().rename_axis('Customer Number')
  2. .rename('Row Number').reset_index())
  3. print(out)
  4. # 输出
  5. Customer Number Row Number
  6. 0 4 4
  7. 1 5 6
  8. 2 7 7
英文:

Suppose the following dataframe:

  1. &gt;&gt;&gt; df
  2. 4 5 7
  3. 0 0.974789 0.976546 0.913151
  4. 1 0.918408 0.815823 0.909577
  5. 2 0.748928 0.801727 0.856562
  6. 3 0.691171 0.791815 0.794988
  7. 4 0.442441 0.669530 0.750395 # Customer 4, first value below 0.5 -&gt; row 4
  8. 5 0.378585 0.568831 0.561721
  9. 6 0.285419 0.287814 0.521966 # Customer 5, first value below 0.5 -&gt; row 6
  10. 7 0.240335 0.216207 0.176980 # Customer 7, first value below 0.5 -&gt; row 7
  11. 8 0.191656 0.095793 0.118300
  12. 9 0.183290 0.087297 0.035063

IIUC, you can use idxmax:

  1. &gt;&gt;&gt; (df &lt;= 0.5).idxmax()
  2. 4 4
  3. 5 6
  4. 7 7
  5. dtype: int64

Full output:

  1. out = ((df &lt;= 0.5).idxmax().rename_axis(&#39;Customer Number&#39;)
  2. .rename(&#39;Row Number&#39;).reset_index())
  3. print(out)
  4. # Output
  5. Customer Number Row Number
  6. 0 4 4
  7. 1 5 6
  8. 2 7 7

huangapple
  • 本文由 发表于 2023年4月4日 17:55:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定