How to add a new column in pandas Dataframe if the string or object value of column 1 is repeated in three consecutive rows

huangapple go评论95阅读模式
英文:

How to add a new column in pandas Dataframe if the string or object value of column 1 is repeated in three consecutive rows

问题

假设,我有一个像这样的数据框,

  1. import pandas as pd
  2. df = pd.DataFrame({'ID': ['p1305', 'p1305', 'p1305', 'p1307', 'p1307', 'p1307', 'p1301', 'p1301', 'p1301', 'p1340', 'p1340', 'p1340','P569','P987','P569']})

我需要添加一个名为y的列,如果ID列中的值连续三行相同,则在列y中添加"yes",否则添加"no"。

这是我尝试过的代码:

  1. # 创建一个大小为3的滚动窗口
  2. rolling = df['ID'].rolling(3)
  3. # 对滚动窗口应用自定义函数以检查所有值是否相同
  4. df['y'] = rolling.apply(lambda x: 'Yes' if all(x == x[0]) else 'No')

然而,上面的代码会引发以下错误:

  1. DataError: No numeric types to aggregate

最终期望的输出是:

  1. ID y
  2. 0 p1305 Yes
  3. 1 p1305 Yes
  4. 2 p1305 Yes
  5. 3 p1307 Yes
  6. 4 p1307 Yes
  7. 5 p1307 Yes
  8. 6 p1301 Yes
  9. 7 p1301 Yes
  10. 8 p1301 Yes
  11. 9 p1340 Yes
  12. 10 P1340 Yes
  13. 11 P1340 Yes

有任何建议或帮助将不胜感激!谢谢。

英文:

Say, I have a dataframe like this,

  1. import pandas as pd
  2. df = pd.DataFrame({'ID': ['p1305', 'p1305', 'p1305', 'p1307', 'p1307', 'p1307', 'p1301', 'p1301', 'p1301', 'p1340', 'p1340', 'p1340','P569','P987','P569']})

I need to add a column y if the values in ID are the same for three consecutive rows, then add yes in column y. Otherwise, add no.

Here is what I have tried,

  1. # create a rolling window of size 3
  2. rolling = df['ID'].rolling(3)
  3. # apply a custom function to the rolling window to check if all values are the same
  4. df['y'] = rolling.apply(lambda x: 'Yes' if all(x == x[0]) else 'No')

However, the above code is throwing the following error,

  1. DataError: No numeric types to aggregate

The final desired output would be:

  1. ID y
  2. 0 p1305 Yes
  3. 1 p1305 Yes
  4. 2 p1305 Yes
  5. 3 p1307 Yes
  6. 4 p1307 Yes
  7. 5 p1307 Yes
  8. 6 p1301 Yes
  9. 7 p1301 Yes
  10. 8 p1301 Yes
  11. 9 p1340 Yes
  12. 10 P1340 Yes
  13. 11 P1340 Yes

Any suggestions or help are much appreciated!
Thanks

答案1

得分: 1

你需要欺骗该方法并首先将其转换为数字,例如使用factorize(或Categorical):

  1. df['y'] = (
  2. pd.Series(pd.factorize(df['ID'])[0], index=df.index)
  3. .rolling(3, min_periods=1).apply(lambda s: s.iloc[1:].eq(s.iloc[0]).all())
  4. .astype(bool)
  5. )

输出:

  1. ID y
  2. 0 p1305 True
  3. 1 p1305 True
  4. 2 p1305 True
  5. 3 p1307 False
  6. 4 p1307 False
  7. 5 p1307 True
  8. 6 p1301 False
  9. 7 p1301 False
  10. 8 p1301 True
  11. 9 p1340 False
  12. 10 p1340 False
  13. 11 p1340 True

如果你想要在分组的所有行中获得True,可以尝试另一种方法:

  1. group = df['ID'].ne(df['ID'].shift()).cumsum()
  2. df['y'] = df.groupby(group)['ID'].transform('size').eq(3) # 或 .ge(3)

输出:

  1. ID y
  2. 0 p1305 True
  3. 1 p1305 True
  4. 2 p1305 True
  5. 3 p1307 True
  6. 4 p1307 True
  7. 5 p1307 True
  8. 6 p1301 True
  9. 7 p1301 True
  10. 8 p1301 True
  11. 9 p1340 True
  12. 10 p1340 True
  13. 11 p1340 True
英文:

You need to trick the method and convert to a number first, for exampe using factorize (or a Categorical):

  1. df['y'] = (
  2. pd.Series(pd.factorize(df['ID'])[0], index=df.index)
  3. .rolling(3, min_periods=1).apply(lambda s: s.iloc[1:].eq(s.iloc[0]).all())
  4. .astype(bool)
  5. )

Output:

  1. ID y
  2. 0 p1305 True
  3. 1 p1305 True
  4. 2 p1305 True
  5. 3 p1307 False
  6. 4 p1307 False
  7. 5 p1307 True
  8. 6 p1301 False
  9. 7 p1301 False
  10. 8 p1301 True
  11. 9 p1340 False
  12. 10 p1340 False
  13. 11 p1340 True

Another approach if you want True in all the rows of the group, would be to use:

  1. group = df['ID'].ne(df['ID'].shift()).cumsum()
  2. df['y'] = df.groupby(group)['ID'].transform('size').eq(3) # or .ge(3)

Output:

  1. ID y
  2. 0 p1305 True
  3. 1 p1305 True
  4. 2 p1305 True
  5. 3 p1307 True
  6. 4 p1307 True
  7. 5 p1307 True
  8. 6 p1301 True
  9. 7 p1301 True
  10. 8 p1301 True
  11. 9 p1340 True
  12. 10 p1340 True
  13. 11 p1340 True

huangapple
  • 本文由 发表于 2023年2月8日 19:39:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385262.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定