如何在Pandas中满足特定条件时添加连续数字

huangapple go评论73阅读模式
英文:

How to add the sequential numbers if certain condition met in Pandas

问题

需要根据特定条件增加顺序号,否则需要保持上一个数字。

原始数据集:

ID Name Status Cluster Gap
1 A 0 1 15
1 B 1 1 35
1 C 1 1 03
2 B 0 1 26
2 C 0 1 16
3 A 1 1 65
3 C 0 1 89
3 F 0 1 19

所需数据集:

ID Name Status Cluster Gap
1 A 0 1 15
1 B 1 2 35
1 C 1 3 03
2 B 0 1 26
2 C 0 1 16
3 A 1 1 65
3 C 0 2 89
3 F 0 2 19
英文:

Need to increase the sequential numbers if certain condition is met or else need to hold the previous number.

Original_dataset:

ID Name Status Cluster Gap
1 A 0 1 15
1 B 1 1 35
1 C 1 1 03
2 B 0 1 26
2 C 0 1 16
3 A 1 1 65
3 C 0 1 89
3 F 0 1 19

Required_Dataset:

ID Name Status Cluster Gap
1 A 0 1 15
1 B 1 2 35
1 C 1 3 03
2 B 0 1 26
2 C 0 1 16
3 A 1 1 65
3 C 0 2 89
3 F 0 2 19

Conditions:

  1. For first occurrence of ID, the cluster should be 1.
  2. If status = 1 or Gap > 28, then cluster needs to increase by 1 based on patient ID (see row 1-C and 2-B --- as ID changes, the Cluster remains 1 as it is the first occurrence of the particular ID).
  3. If the condition is not satisfied, it needs to hold the previous cluster number. (Can refer the final row).

The code which I have tried is:

Original_dataset.loc[((new_df4['gap'] > 28) | (Original_dataset['status'] == 1)),'Cluster'] = Original_dataset['Cluster'] + 1

答案1

得分: 2

首先,如果Gap大于28,或者Status1,或者在DataFrame.loc中的ID的第一个重复值之前,将Cluster1设置为1。然后使用lambda函数结合GroupBy.cumsumGroupBy.ffill

m = df['Gap'].gt(28)
m1 = df['Status'].eq(1)
m2 = ~df['ID'].duplicated()

df.loc[m | m1 | m2, 'Cluster1'] = 1

f = lambda x: x.cumsum().ffill(downcast='int')
df['Cluster1'] = df.groupby('ID')['Cluster1'].transform(f)
print(df)
   ID Name  Status  Cluster  Gap  Cluster1
0   1    A       0        1   15         1
1   1    B       1        1   35         2
2   1    C       1        1    3         3
3   2    B       0        1   26         1
4   2    C       0        1   16         1
5   3    A       1        1   65         1
6   3    C       0        1   89         2
7   3    F       0        1   19         2
英文:

First set 1 if greater GAP like 28 or if Status is 1 or first duplicated value of ID in DataFrame.loc, then use lambda function with GroupBy.cumsum and GroupBy.ffill

m = df['Gap'].gt(28)
m1 = df['Status'].eq(1)
m2 = ~df['ID'].duplicated()

df.loc[m | m1 | m2, 'Cluster1'] = 1

f = lambda x: x.cumsum().ffill(downcast='int')
df['Cluster1'] = df.groupby('ID')['Cluster1'].transform(f)
print (df)
   ID Name  Status  Cluster  Gap  Cluster1
0   1    A       0        1   15         1
1   1    B       1        1   35         2
2   1    C       1        1    3         3
3   2    B       0        1   26         1
4   2    C       0        1   16         1
5   3    A       1        1   65         1
6   3    C       0        1   89         2
7   3    F       0        1   19         2

答案2

得分: 0

您可以根据3个条件使用掩码,并使用简单的[`groupby.cumsum`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumsum.html):

是否 Gap≥28 或者 Status==1 或者第一个 ID?

m = df['Gap'].ge(28) | df['Status'].eq(1) | ~df['ID'].duplicated()

然后按组递增计数

df['Cluster'] = m.groupby(df['ID']).cumsum()

输出:

ID 名称 状态 集群 差距
0 1 A 0 1 15
1 1 B 1 2 35
2 1 C 1 3 3
3 2 B 0 1 26
4 2 C 0 1 16
5 3 A 1 1 65
6 3 C 0 2 89
7 3 F 0 2 19


<details>
<summary>英文:</summary>

You can use a mask based on 3 conditions, and a simple [`groupby.cumsum`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumsum.html):

is Gap≥28 OR Status==1 OR first ID ?

m = df['Gap'].ge(28) | df['Status'].eq(1) | ~df['ID'].duplicated()

then increment count per group

df['Cluster'] = m.groupby(df['ID']).cumsum()

Output:

ID Name Status Cluster Gap
0 1 A 0 1 15
1 1 B 1 2 35
2 1 C 1 3 3
3 2 B 0 1 26
4 2 C 0 1 16
5 3 A 1 1 65
6 3 C 0 2 89
7 3 F 0 2 19


</details>



huangapple
  • 本文由 发表于 2023年3月8日 14:45:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75670064.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定