2023年2月8日 19:39:45go评论95阅读模式

英文:

How to add a new column in pandas Dataframe if the string or object value of column 1 is repeated in three consecutive rows

问题

假设，我有一个像这样的数据框，

import pandas as pd
df = pd.DataFrame({'ID': ['p1305', 'p1305', 'p1305', 'p1307', 'p1307', 'p1307', 'p1301', 'p1301', 'p1301', 'p1340', 'p1340', 'p1340','P569','P987','P569']})

我需要添加一个名为y的列，如果ID列中的值连续三行相同，则在列y中添加"yes"，否则添加"no"。

这是我尝试过的代码：

# 创建一个大小为3的滚动窗口
rolling = df['ID'].rolling(3)
# 对滚动窗口应用自定义函数以检查所有值是否相同
df['y'] = rolling.apply(lambda x: 'Yes' if all(x == x[0]) else 'No')

然而，上面的代码会引发以下错误：

DataError: No numeric types to aggregate

最终期望的输出是：

      ID        y
0   p1305  Yes
1   p1305  Yes
2   p1305  Yes
3   p1307  Yes
4   p1307  Yes
5   p1307  Yes
6   p1301  Yes
7   p1301  Yes
8   p1301  Yes
9   p1340  Yes
10  P1340  Yes
11  P1340  Yes

有任何建议或帮助将不胜感激！谢谢。

英文:

Say, I have a dataframe like this,

import pandas as pd
df = pd.DataFrame({&#39;ID&#39;: [&#39;p1305&#39;, &#39;p1305&#39;, &#39;p1305&#39;, &#39;p1307&#39;, &#39;p1307&#39;, &#39;p1307&#39;, &#39;p1301&#39;, &#39;p1301&#39;, &#39;p1301&#39;, &#39;p1340&#39;, &#39;p1340&#39;, &#39;p1340&#39;,&#39;P569&#39;,&#39;P987&#39;,&#39;P569&#39;]})

I need to add a column y if the values in ID are the same for three consecutive rows, then add yes in column y. Otherwise, add no.

Here is what I have tried,

# create a rolling window of size 3
rolling = df[&#39;ID&#39;].rolling(3)
# apply a custom function to the rolling window to check if all values are the same
df[&#39;y&#39;] = rolling.apply(lambda x: &#39;Yes&#39; if all(x == x[0]) else &#39;No&#39;)

However, the above code is throwing the following error,

DataError: No numeric types to aggregate

The final desired output would be:

  ID        y
0   p1305  Yes
1   p1305  Yes
2   p1305  Yes
3   p1307  Yes
4   p1307  Yes
5   p1307  Yes
6   p1301  Yes
7   p1301  Yes
8   p1301  Yes
9   p1340  Yes
10  P1340  Yes
11  P1340  Yes

Any suggestions or help are much appreciated!
Thanks

答案1

得分: 1

你需要欺骗该方法并首先将其转换为数字，例如使用factorize（或Categorical）：

df['y'] = (
 pd.Series(pd.factorize(df['ID'])[0], index=df.index)
   .rolling(3, min_periods=1).apply(lambda s: s.iloc[1:].eq(s.iloc[0]).all())
   .astype(bool)
)

输出：

       ID      y
0   p1305   True
1   p1305   True
2   p1305   True
3   p1307  False
4   p1307  False
5   p1307   True
6   p1301  False
7   p1301  False
8   p1301   True
9   p1340  False
10  p1340  False
11  p1340   True

如果你想要在分组的所有行中获得True，可以尝试另一种方法：

group = df['ID'].ne(df['ID'].shift()).cumsum()
df['y'] = df.groupby(group)['ID'].transform('size').eq(3) # 或 .ge(3)

输出：

       ID     y
0   p1305  True
1   p1305  True
2   p1305  True
3   p1307  True
4   p1307  True
5   p1307  True
6   p1301  True
7   p1301  True
8   p1301  True
9   p1340  True
10  p1340  True
11  p1340  True

英文:

You need to trick the method and convert to a number first, for exampe using factorize (or a Categorical):

df[&#39;y&#39;] = (
 pd.Series(pd.factorize(df[&#39;ID&#39;])[0], index=df.index)
   .rolling(3, min_periods=1).apply(lambda s: s.iloc[1:].eq(s.iloc[0]).all())
   .astype(bool)
)

Output:

       ID      y
0   p1305   True
1   p1305   True
2   p1305   True
3   p1307  False
4   p1307  False
5   p1307   True
6   p1301  False
7   p1301  False
8   p1301   True
9   p1340  False
10  p1340  False
11  p1340   True

Another approach if you want True in all the rows of the group, would be to use:

group = df[&#39;ID&#39;].ne(df[&#39;ID&#39;].shift()).cumsum()
df[&#39;y&#39;] = df.groupby(group)[&#39;ID&#39;].transform(&#39;size&#39;).eq(3) # or .ge(3)

Output:

       ID     y
0   p1305  True
1   p1305  True
2   p1305  True
3   p1307  True
4   p1307  True
5   p1307  True
6   p1301  True
7   p1301  True
8   p1301  True
9   p1340  True
10  p1340  True
11  p1340  True

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to add a new column in pandas Dataframe if the string or object value of column 1 is repeated in three consecutive rows

问题

答案1

Ansible + Python – 以编程方式提供 ansible-vault 密码

Pika示例链接：https://github.com/pika/pika/blob/main/examples/basic_consumer_threaded.py

如何通过pygwalker保留我创建的图表？

如何将浮点数转换为整数，去掉 “0” 和 “,”？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。