2023年5月25日 09:36:45go评论116阅读模式

英文:

subset first and last consecutive value from pandas df col - python

问题

import pandas as pd
df = pd.DataFrame({"Item": ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'A'],
                   "Val1": [-20, -21, -20, -20, -20, -21, -20, -23, -22],
                   "Val2": [150, 151, 150, 148, 149, 150, 151, 150, 148]
                   })
df1 = df[df['Item'] != df['Item'].shift()]
print(df1)

预期输出:

  Item  Val1  Val2
0    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150

请注意，上述代码已经将 ne 替换为了 !=，因为 ne 是用于比较两个 Series 是否不相等的方法，而 != 是用于比较两个元素是否不相等的操作符。

英文:

I want to subset a df by returning the first and last consecutive value from a pandas col. Drop_duplciates won't work because it doesn't account for consecutive groupings. I'm using .shift() (below) but this only returns the last consecutive value, where I want the first and last.

import pandas as pd
df = pd.DataFrame({&quot;Item&quot;:[&#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;A&#39;, &#39;A&#39;], 
           &quot;Val1&quot;:[-20, -21, -20, -20, -20, -21, -20, -23, -22], 
           &quot;Val2&quot;:[150, 151, 150, 148, 149, 150, 151, 150, 148]
           })
df1 = df[df[&#39;Item&#39;].ne(df[&#39;Item&#39;].shift())]
print(df1)

intended output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

答案1

得分: 3

你需要比较前向和后向移位的值，以便找到每个组的起始和结束位置：

df1 = df[(df['Item'].ne(df['Item'].shift())) | 
         (df['Item'].ne(df['Item'].shift(-1)))]

输出：

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

英文:

You need to compare against both the forward and backward shifted values so that you can find the start and finish of each group:

df1 = df[(df[&#39;Item&#39;].ne(df[&#39;Item&#39;].shift())) | 
         (df[&#39;Item&#39;].ne(df[&#39;Item&#39;].shift(-1)))]

Output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

答案2

得分: 2

这里使用groupby和nth的选项：

df.groupby(df['Item'].ne(df['Item'].shift()).cumsum(), as_index=False).nth([0, -1])

输出：

      Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

英文:

Here is an option using groupby and nth

df.groupby(df[&#39;Item&#39;].ne(df[&#39;Item&#39;].shift()).cumsum(),as_index=False).nth([0,-1])

Output:

  Item  Val1  Val2
0    A   -20   150
2    A   -20   150
3    B   -20   148
6    B   -20   151
7    A   -23   150
8    A   -22   148

答案3

得分: 1

尝试：

df['group'] = (df['Item'] != df['Item'].shift()).cumsum()
cols = ['group', 'Item']
df[~df.duplicated(cols, keep='last') | ~df.duplicated(cols, keep='first')]

输出：

      Item  Val1  Val2  group
    0    A   -20   150      1
    2    A   -20   150      1
    3    B   -20   148      2
    6    B   -20   151      2
    7    A   -23   150      3
    8    A   -22   148      3

英文:

Try:

df[&#39;group&#39;] = (df[&#39;Item&#39;] != df[&#39;Item&#39;].shift()).cumsum()
cols = [&#39;group&#39;, &#39;Item&#39;]
df[~df.duplicated(cols, keep=&#39;last&#39;) | ~df.duplicated(cols, keep=&#39;first&#39;)]

Output:

  Item  Val1  Val2  group
0    A   -20   150      1
2    A   -20   150      1
3    B   -20   148      2
6    B   -20   151      2
7    A   -23   150      3
8    A   -22   148      3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从pandas DataFrame列中提取连续的第一个和最后一个值 – Python

问题

答案1

答案2

答案3

如何避免在Selenium中触发TimeoutException(message, screen, stacktrace)？

使用JAX和JIT计算非零元素数量

使用BeautifulSoup如何抓取元素的相关类别？

将字典值分配给特定列，根据字典键

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论