从pandas DataFrame列中提取连续的第一个和最后一个值 – Python

huangapple go评论86阅读模式
英文:

subset first and last consecutive value from pandas df col - python

问题

import pandas as pd

df = pd.DataFrame({"Item": ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
                   "Val1": [-20, -21, -20, -20, -20, -21, -20, -23, -22],
                   "Val2": [150, 151, 150, 148, 149, 150, 151, 150, 148]
                   })

df1 = df[df['Item'] != df['Item'].shift()]

print(df1)

预期输出:

  Item  Val1  Val2
0    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150

请注意,上述代码已经将 ne 替换为了 !=,因为 ne 是用于比较两个 Series 是否不相等的方法,而 != 是用于比较两个元素是否不相等的操作符。

英文:

I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates won't work because it doesn't account for consecutive groupings. I'm using .shift() (below) but this only returns the last consecutive value, where I want the first and last.

import pandas as pd

df = pd.DataFrame({"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'], 
           "Val1":[-20, -21, -20, -20, -20, -21, -20, -23, -22], 
           "Val2":[150, 151, 150, 148, 149, 150, 151, 150, 148]
           })

df1 = df[df['Item'].ne(df['Item'].shift())]

print(df1)

intended output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

答案1

得分: 3

你需要比较前向和后向移位的值,以便找到每个组的起始和结束位置:

df1 = df[(df['Item'].ne(df['Item'].shift())) | 
         (df['Item'].ne(df['Item'].shift(-1)))]

输出:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148
英文:

You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:

df1 = df[(df['Item'].ne(df['Item'].shift())) | 
         (df['Item'].ne(df['Item'].shift(-1)))]

Output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

答案2

得分: 2

这里使用groupbynth的选项:

df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(), as_index=False).nth([0, -1])

输出:

      Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148
英文:

Here is an option using groupby and nth

df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(),as_index=False).nth([0,-1])

Output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

答案3

得分: 1

尝试:

df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
cols = ['group', 'Item']
df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]

输出:

      Item  Val1  Val2  group
    0    A   -20   150      1
    2    A   -20   150      1
    3    B   -20   148      2
    6    B   -20   151      2
    7    A   -23   150      3
    8    A   -22   148      3
英文:

Try:

df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
cols = ['group', 'Item']
df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]

Output:

  Item  Val1  Val2  group
0    A   -20   150      1
2    A   -20   150      1
3    B   -20   148      2
6    B   -20   151      2
7    A   -23   150      3
8    A   -22   148      3

huangapple
  • 本文由 发表于 2023年5月25日 09:36:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328354.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定