英文:
subset first and last consecutive value from pandas df col - python
问题
import pandas as pd
df = pd.DataFrame({"Item": ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
"Val1": [-20, -21, -20, -20, -20, -21, -20, -23, -22],
"Val2": [150, 151, 150, 148, 149, 150, 151, 150, 148]
})
df1 = df[df['Item'] != df['Item'].shift()]
print(df1)
预期输出:
Item Val1 Val2
0 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
请注意,上述代码已经将 ne
替换为了 !=
,因为 ne
是用于比较两个 Series 是否不相等的方法,而 !=
是用于比较两个元素是否不相等的操作符。
英文:
I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates
won't work because it doesn't account for consecutive groupings. I'm using .shift()
(below) but this only returns the last consecutive value, where I want the first and last.
import pandas as pd
df = pd.DataFrame({"Item":['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
"Val1":[-20, -21, -20, -20, -20, -21, -20, -23, -22],
"Val2":[150, 151, 150, 148, 149, 150, 151, 150, 148]
})
df1 = df[df['Item'].ne(df['Item'].shift())]
print(df1)
intended output:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148
答案1
得分: 3
你需要比较前向和后向移位的值,以便找到每个组的起始和结束位置:
df1 = df[(df['Item'].ne(df['Item'].shift())) |
(df['Item'].ne(df['Item'].shift(-1)))]
输出:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148
英文:
You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:
df1 = df[(df['Item'].ne(df['Item'].shift())) |
(df['Item'].ne(df['Item'].shift(-1)))]
Output:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148
答案2
得分: 2
这里使用groupby
和nth
的选项:
df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(), as_index=False).nth([0, -1])
输出:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148
英文:
Here is an option using groupby
and nth
df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(),as_index=False).nth([0,-1])
Output:
Item Val1 Val2
0 A -20 150
2 A -20 150
3 B -20 148
6 B -20 151
7 A -23 150
8 A -22 148
答案3
得分: 1
尝试:
df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
cols = ['group', 'Item']
df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]
输出:
Item Val1 Val2 group
0 A -20 150 1
2 A -20 150 1
3 B -20 148 2
6 B -20 151 2
7 A -23 150 3
8 A -22 148 3
英文:
Try:
df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
cols = ['group', 'Item']
df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]
Output:
Item Val1 Val2 group
0 A -20 150 1
2 A -20 150 1
3 B -20 148 2
6 B -20 151 2
7 A -23 150 3
8 A -22 148 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论