2023年3月8日 14:45:49go评论93阅读模式

英文:

How to add the sequential numbers if certain condition met in Pandas

问题

需要根据特定条件增加顺序号，否则需要保持上一个数字。

原始数据集：

ID	Name	Status	Cluster	Gap
1	A	0	1	15
1	B	1	1	35
1	C	1	1	03
2	B	0	1	26
2	C	0	1	16
3	A	1	1	65
3	C	0	1	89
3	F	0	1	19

所需数据集：

ID	Name	Status	Cluster	Gap
1	A	0	1	15
1	B	1	2	35
1	C	1	3	03
2	B	0	1	26
2	C	0	1	16
3	A	1	1	65
3	C	0	2	89
3	F	0	2	19

英文:

Need to increase the sequential numbers if certain condition is met or else need to hold the previous number.

Original_dataset:

ID	Name	Status	Cluster	Gap
1	A	0	1	15
1	B	1	1	35
1	C	1	1	03
2	B	0	1	26
2	C	0	1	16
3	A	1	1	65
3	C	0	1	89
3	F	0	1	19

Required_Dataset:

ID	Name	Status	Cluster	Gap
1	A	0	1	15
1	B	1	2	35
1	C	1	3	03
2	B	0	1	26
2	C	0	1	16
3	A	1	1	65
3	C	0	2	89
3	F	0	2	19

Conditions:

For first occurrence of ID, the cluster should be 1.
If status = 1 or Gap > 28, then cluster needs to increase by 1 based on patient ID (see row 1-C and 2-B --- as ID changes, the Cluster remains 1 as it is the first occurrence of the particular ID).
If the condition is not satisfied, it needs to hold the previous cluster number. (Can refer the final row).

The code which I have tried is:

Original_dataset.loc[((new_df4[&#39;gap&#39;] &gt; 28) | (Original_dataset[&#39;status&#39;] == 1)),&#39;Cluster&#39;] = Original_dataset[&#39;Cluster&#39;] + 1

答案1

得分: 2

首先，如果Gap大于28，或者Status为1，或者在DataFrame.loc中的ID的第一个重复值之前，将Cluster1设置为1。然后使用lambda函数结合GroupBy.cumsum和GroupBy.ffill。

m = df['Gap'].gt(28)
m1 = df['Status'].eq(1)
m2 = ~df['ID'].duplicated()
df.loc[m | m1 | m2, 'Cluster1'] = 1
f = lambda x: x.cumsum().ffill(downcast='int')
df['Cluster1'] = df.groupby('ID')['Cluster1'].transform(f)
print(df)
   ID Name  Status  Cluster  Gap  Cluster1
0   1    A       0        1   15         1
1   1    B       1        1   35         2
2   1    C       1        1    3         3
3   2    B       0        1   26         1
4   2    C       0        1   16         1
5   3    A       1        1   65         1
6   3    C       0        1   89         2
7   3    F       0        1   19         2

英文:

First set 1 if greater GAP like 28 or if Status is 1 or first duplicated value of ID in DataFrame.loc, then use lambda function with GroupBy.cumsum and GroupBy.ffill

m = df[&#39;Gap&#39;].gt(28)
m1 = df[&#39;Status&#39;].eq(1)
m2 = ~df[&#39;ID&#39;].duplicated()
df.loc[m | m1 | m2, &#39;Cluster1&#39;] = 1
f = lambda x: x.cumsum().ffill(downcast=&#39;int&#39;)
df[&#39;Cluster1&#39;] = df.groupby(&#39;ID&#39;)[&#39;Cluster1&#39;].transform(f)
print (df)
   ID Name  Status  Cluster  Gap  Cluster1
0   1    A       0        1   15         1
1   1    B       1        1   35         2
2   1    C       1        1    3         3
3   2    B       0        1   26         1
4   2    C       0        1   16         1
5   3    A       1        1   65         1
6   3    C       0        1   89         2
7   3    F       0        1   19         2

答案2

得分: 0

您可以根据3个条件使用掩码，并使用简单的[`groupby.cumsum`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumsum.html)：

是否 Gap≥28 或者 Status==1 或者第一个 ID？

m = df['Gap'].ge(28) | df['Status'].eq(1) | ~df['ID'].duplicated()

然后按组递增计数

df['Cluster'] = m.groupby(df['ID']).cumsum()

输出：

ID 名称状态集群差距
0 1 A 0 1 15
1 1 B 1 2 35
2 1 C 1 3 3
3 2 B 0 1 26
4 2 C 0 1 16
5 3 A 1 1 65
6 3 C 0 2 89
7 3 F 0 2 19


<details>
<summary>英文:</summary>
You can use a mask based on 3 conditions, and a simple [`groupby.cumsum`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumsum.html):

is Gap≥28 OR Status==1 OR first ID ?

m = df['Gap'].ge(28) | df['Status'].eq(1) | ~df['ID'].duplicated()

then increment count per group

df['Cluster'] = m.groupby(df['ID']).cumsum()

Output:

ID Name Status Cluster Gap
0 1 A 0 1 15
1 1 B 1 2 35
2 1 C 1 3 3
3 2 B 0 1 26
4 2 C 0 1 16
5 3 A 1 1 65
6 3 C 0 2 89
7 3 F 0 2 19


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Pandas中满足特定条件时添加连续数字

问题

答案1

答案2

是否 Gap≥28 或者 Status==1 或者第一个 ID？

然后按组递增计数

is Gap≥28 OR Status==1 OR first ID ?

then increment count per group

为什么RecursiveCharacterTextSplitter没有提供任何块重叠？

类型提示用于抽象类内部的通用属性

如何在Airflow中正确地将子链中的第一个和最后一个链接到链中？

如何使用Python解析*.py文件？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。