从pandas DataFrame列中提取连续的第一个和最后一个值 – Python

huangapple go评论116阅读模式
英文:

subset first and last consecutive value from pandas df col - python

问题

  1. import pandas as pd
  2. df = pd.DataFrame({"Item": ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
  3. "Val1": [-20, -21, -20, -20, -20, -21, -20, -23, -22],
  4. "Val2": [150, 151, 150, 148, 149, 150, 151, 150, 148]
  5. })
  6. df1 = df[df['Item'] != df['Item'].shift()]
  7. print(df1)

预期输出:

  1. Item Val1 Val2
  2. 0 A -20 150
  3. 3 B -20 148
  4. 6 B -20 151
  5. 7 A -23 150

请注意,上述代码已经将 ne 替换为了 !=,因为 ne 是用于比较两个 Series 是否不相等的方法,而 != 是用于比较两个元素是否不相等的操作符。

英文:

I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates won't work because it doesn't account for consecutive groupings. I'm using .shift() (below) but this only returns the last consecutive value, where I want the first and last.

  1. import pandas as pd
  2. df = pd.DataFrame({"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
  3. "Val1":[-20, -21, -20, -20, -20, -21, -20, -23, -22],
  4. "Val2":[150, 151, 150, 148, 149, 150, 151, 150, 148]
  5. })
  6. df1 = df[df['Item'].ne(df['Item'].shift())]
  7. print(df1)

intended output:

  1. Item Val1 Val2
  2. 0 A -20 150
  3. 2 A -20 150
  4. 3 B -20 148
  5. 6 B -20 151
  6. 7 A -23 150
  7. 8 A -22 148

答案1

得分: 3

你需要比较前向和后向移位的值,以便找到每个组的起始和结束位置:

  1. df1 = df[(df['Item'].ne(df['Item'].shift())) |
  2. (df['Item'].ne(df['Item'].shift(-1)))]

输出:

  1. Item Val1 Val2
  2. 0 A -20 150
  3. 2 A -20 150
  4. 3 B -20 148
  5. 6 B -20 151
  6. 7 A -23 150
  7. 8 A -22 148
英文:

You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:

  1. df1 = df[(df['Item'].ne(df['Item'].shift())) |
  2. (df['Item'].ne(df['Item'].shift(-1)))]

Output:

  1. Item Val1 Val2
  2. 0 A -20 150
  3. 2 A -20 150
  4. 3 B -20 148
  5. 6 B -20 151
  6. 7 A -23 150
  7. 8 A -22 148

答案2

得分: 2

这里使用groupbynth的选项:

  1. df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(), as_index=False).nth([0, -1])

输出:

  1. Item Val1 Val2
  2. 0 A -20 150
  3. 2 A -20 150
  4. 3 B -20 148
  5. 6 B -20 151
  6. 7 A -23 150
  7. 8 A -22 148
英文:

Here is an option using groupby and nth

  1. df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(),as_index=False).nth([0,-1])

Output:

  1. Item Val1 Val2
  2. 0 A -20 150
  3. 2 A -20 150
  4. 3 B -20 148
  5. 6 B -20 151
  6. 7 A -23 150
  7. 8 A -22 148

答案3

得分: 1

尝试:

  1. df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
  2. cols = ['group', 'Item']
  3. df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]

输出:

  1. Item Val1 Val2 group
  2. 0 A -20 150 1
  3. 2 A -20 150 1
  4. 3 B -20 148 2
  5. 6 B -20 151 2
  6. 7 A -23 150 3
  7. 8 A -22 148 3
英文:

Try:

  1. df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
  2. cols = ['group', 'Item']
  3. df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]

Output:

  1. Item Val1 Val2 group
  2. 0 A -20 150 1
  3. 2 A -20 150 1
  4. 3 B -20 148 2
  5. 6 B -20 151 2
  6. 7 A -23 150 3
  7. 8 A -22 148 3

huangapple
  • 本文由 发表于 2023年5月25日 09:36:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328354.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定