2023年6月9日 14:10:49go评论139阅读模式

英文:

Delete rows from bottom, from every group/id of a dataframe

问题

以下是您提供的代码的翻译部分：

#加载所需的库
import pandas as pd

#创建数据集
data = {'id': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
               2, 2, 2, 2, 2,
               3, 3, 3, 3, 3, 3,
               4, 4, 4, 4, 4, 4, 4, 4,
               5, 5, 5, 5, 5, 5, 5, 5, 5],
        'cycle': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
                  1, 2, 3, 4, 5,
                  1, 2, 3, 4, 5, 6,
                  1, 2, 3, 4, 5, 6, 7, 8,
                  1, 2, 3, 4, 5, 6, 7, 8, 9],
        'Salary': [7, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                   4, 5, 6, 7, 8,
                   8, 9, 10, 11, 12, 13,
                   8, 1, 2, 3, 4, 5, 6, 7,
                   7, 7, 9, 10, 11, 12, 13, 14, 15],
        'Children': ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No',
                     'Yes', 'No', 'Yes', 'No', 'Yes',
                     'No', 'Yes', 'Yes', 'No', 'No', 'Yes',
                     'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes',
                     'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No'],
        'Days': [123, 128, 66, 66, 120, 141, 52, 96, 120, 141, 52,
                 96, 120, 128, 66, 120,
                 15, 123, 128, 66, 120, 141,
                 141, 128, 66, 123, 128, 66, 120, 141,
                 123, 128, 66, 123, 128, 66, 120, 141, 52]
        }

#转换为数据框
df = pd.DataFrame(data)
print("df = \n", df)

请注意，这是您提供的代码的翻译部分，用于创建数据框。如果您需要进一步的翻译或有其他问题，请告诉我。

英文:

I have a dataset as such:

#Load the required libraries
import pandas as pd
#Create dataset
data = {&#39;id&#39;: [1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1,
2, 2,2,2,2,
3, 3, 3, 3, 3, 3,
4, 4,4,4,4,4,4,4,
5, 5, 5, 5, 5,5, 5, 5,5],
&#39;cycle&#39;: [1,2, 3, 4, 5,6,7,8,9,10,11,
1,2, 3,4,5,
1,2, 3, 4, 5,6,
1,2,3,4,5,6,7,8,
1,2, 3, 4, 5,6,7,8,9,],
&#39;Salary&#39;: [7, 7, 7,8,9,10,11,12,13,14,15,
4, 5,6,7,8,
8,9,10,11,12,13,
8,1,2,3,4,5,6,7,
7, 7,9,10,11,12,13,14,15,],
&#39;Children&#39;: [&#39;No&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;No&#39;,&#39;No&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;No&#39;,
&#39;Yes&#39;, &#39;No&#39;, &#39;Yes&#39;, &#39;No&#39;, &#39;Yes&#39;,
&#39;No&#39;,&#39;Yes&#39;, &#39;Yes&#39;, &#39;No&#39;,&#39;No&#39;, &#39;Yes&#39;,
&#39;Yes&#39;,&#39;Yes&#39;, &#39;Yes&#39;, &#39;No&#39;,&#39;No&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;Yes&#39;,
&#39;No&#39;,  &#39;Yes&#39;, &#39;No&#39;, &#39;No&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;Yes&#39;, &#39;No&#39;,],
&#39;Days&#39;: [123, 128, 66, 66, 120, 141, 52,96, 120, 141, 52,
96, 120,128, 66, 120,
15,123, 128, 66, 120, 141,
141,128, 66, 123, 128, 66, 120,141, 
123, 128, 66, 123, 128, 66, 120, 141, 52,],
}
#Convert to dataframe
df = pd.DataFrame(data)
print(&quot;df = \n&quot;, df)

The dataframe looks as such:

Here, every id has different cycles as per the 'cycle' column. For example,

id-1 has maximum 11 cycles.

id-2 has maximum 5 cycles.

id-3 has maximum 6 cycles.

id-4 has maximum 8 cycles.

id-5 has maximum 9 cycles.

Now, for every id, I wish to delete rows from the bottom.

For example,

For id-1, delete last four rows.

For id-2, delete last two rows.

For id-3, delete last three rows.

For id-4, delete last five rows.

For id-5, delete last six rows.

The dataframe then looks as such:

Can somebody please let me know how do I achieve this task in Python?

答案1

得分: 1

创建一个字典来指定删除行的数量，并使用[`GroupBy.cumcount`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumcount.html)按照降序比较通过[`Series.map`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html)映射的`id`列，然后可以通过[`布尔索引`](http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)进行过滤:

    d = {1:4,2:2,3:3,4:5,5:6}

    df = df[df.groupby('id').cumcount(ascending=False).ge(df['id'].map(d))]
    print (df)
        id  cycle  Salary Children  Days
    0    1      1       7       No   123
    1    1      2       7      Yes   128
    2    1      3       7      Yes    66
    3    1      4       8      Yes    66
    4    1      5       9      Yes   120
    5    1      6      10       No   141
    6    1      7      11       No    52
    11   2      1       4      Yes    96
    12   2      2       5       No   120
    13   2      3       6      Yes   128
    16   3      1       8       No    15
    17   3      2       9      Yes   123
    18   3      3      10      Yes   128
    22   4      1       8      Yes   141
    23   4      2       1      Yes   128
    24   4      3       2      Yes    66
    30   5      1       7       No   123
    31   5      2       7      Yes   128
    32   5      3       9       No    66

英文:

Create dictionary for specify number of deleted rows and compare by counter from back by GroupBy.cumcount with ascending=False mapped id column by Series.map, so possible filter by boolean indexing:

d = {1:4,2:2,3:3,4:5,5:6}
df = df[df.groupby(&#39;id&#39;).cumcount(ascending=False).ge(df[&#39;id&#39;].map(d))]
print (df)
id  cycle  Salary Children  Days
0    1      1       7       No   123
1    1      2       7      Yes   128
2    1      3       7      Yes    66
3    1      4       8      Yes    66
4    1      5       9      Yes   120
5    1      6      10       No   141
6    1      7      11       No    52
11   2      1       4      Yes    96
12   2      2       5       No   120
13   2      3       6      Yes   128
16   3      1       8       No    15
17   3      2       9      Yes   123
18   3      3      10      Yes   128
22   4      1       8      Yes   141
23   4      2       1      Yes   128
24   4      3       2      Yes    66
30   5      1       7       No   123
31   5      2       7      Yes   128
32   5      3       9       No    66

答案2

得分: 1

@jezrael 的方法是一个很好的捕捉，我将尝试以下方式。
它简单地刮擦每个子矩阵，进行推导，然后重新组合它们。
看起来有点啰嗦，但它遵循一个清晰的模式，您可以将其制作成一个更通用的函数。

使用这个函数，最左列的升序索引将被重新排序。

df1 = df[df['id'] == 1].iloc[:-4]
df2 = df[df['id'] == 2].iloc[:-2]
df3 = df[df['id'] == 3].iloc[:-3]
df4 = df[df['id'] == 4].iloc[:-5]
df5 = df[df['id'] == 5].iloc[:-6]
df = pd.concat([df1, df2, df3, df4, df5])
data = df.to_dict('list')
df = pd.DataFrame(data)
print("df = \n", df)

df = 
     id  cycle  Salary Children  Days
0    1      1       7       No   123
1    1      2       7      Yes   128
2    1      3       7      Yes    66
3    1      4       8      Yes    66
4    1      5       9      Yes   120
5    1      6      10       No   141
6    1      7      11       No    52
7    2      1       4      Yes    96
8    2      2       5       No   120
9    2      3       6      Yes   128
10   3      1       8       No    15
11   3      2       9      Yes   123
12   3      3      10      Yes   128
13   4      1       8      Yes   141
14   4      2       1      Yes   128
15   4      3       2      Yes    66
16   5      1       7       No   123
17   5      2       7      Yes   128
18   5      3       9       No    66

英文:

@jezrael 's method is a great catch, I'll give my try as below.
It simply scratches each sub-matrix, makes the deduction, and recombines them.
It seems wordy but it follows a clear pattern and you can make it a function for more generic use.

By using this function, the ascending index at the very left column will be reordered.

df1 = df[df[&#39;id&#39;] == 1].iloc[:-4]
df2 = df[df[&#39;id&#39;] == 2].iloc[:-2]
df3 = df[df[&#39;id&#39;] == 3].iloc[:-3]
df4 = df[df[&#39;id&#39;] == 4].iloc[:-5]
df5 = df[df[&#39;id&#39;] == 5].iloc[:-6]
df = pd.concat([df1, df2, df3, df4, df5])
data = df.to_dict(&#39;list&#39;)
df = pd.DataFrame(data)
print(&quot;df = \n&quot;, df)

df = 
id  cycle  Salary Children  Days
0    1      1       7       No   123
1    1      2       7      Yes   128
2    1      3       7      Yes    66
3    1      4       8      Yes    66
4    1      5       9      Yes   120
5    1      6      10       No   141
6    1      7      11       No    52
7    2      1       4      Yes    96
8    2      2       5       No   120
9    2      3       6      Yes   128
10   3      1       8       No    15
11   3      2       9      Yes   123
12   3      3      10      Yes   128
13   4      1       8      Yes   141
14   4      2       1      Yes   128
15   4      3       2      Yes    66
16   5      1       7       No   123
17   5      2       7      Yes   128
18   5      3       9       No    66

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从数据框的每个组/ID中从底部删除行。

问题

答案1

答案2

我该如何将DataFrame按照PyTorch Geometric的节点索引重新排列？

更快的将大型嵌套XML转换为R数据框的方法

“Maximum Variance Unfolding with CVXPY” 可以翻译为 “使用CVXPY进行最大方差展开”。

将Python列表根据元素条件分成多个列表。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论