英文:
Delete rows from bottom, from every group/id of a dataframe
问题
以下是您提供的代码的翻译部分:
#加载所需的库
import pandas as pd
#创建数据集
data = {'id': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5],
'cycle': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
1, 2, 3, 4, 5,
1, 2, 3, 4, 5, 6,
1, 2, 3, 4, 5, 6, 7, 8,
1, 2, 3, 4, 5, 6, 7, 8, 9],
'Salary': [7, 7, 7, 8, 9, 10, 11, 12, 13, 14, 15,
4, 5, 6, 7, 8,
8, 9, 10, 11, 12, 13,
8, 1, 2, 3, 4, 5, 6, 7,
7, 7, 9, 10, 11, 12, 13, 14, 15],
'Children': ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No',
'Yes', 'No', 'Yes', 'No', 'Yes',
'No', 'Yes', 'Yes', 'No', 'No', 'Yes',
'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes',
'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No'],
'Days': [123, 128, 66, 66, 120, 141, 52, 96, 120, 141, 52,
96, 120, 128, 66, 120,
15, 123, 128, 66, 120, 141,
141, 128, 66, 123, 128, 66, 120, 141,
123, 128, 66, 123, 128, 66, 120, 141, 52]
}
#转换为数据框
df = pd.DataFrame(data)
print("df = \n", df)
请注意,这是您提供的代码的翻译部分,用于创建数据框。如果您需要进一步的翻译或有其他问题,请告诉我。
英文:
I have a dataset as such:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'id': [1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1,
2, 2,2,2,2,
3, 3, 3, 3, 3, 3,
4, 4,4,4,4,4,4,4,
5, 5, 5, 5, 5,5, 5, 5,5],
'cycle': [1,2, 3, 4, 5,6,7,8,9,10,11,
1,2, 3,4,5,
1,2, 3, 4, 5,6,
1,2,3,4,5,6,7,8,
1,2, 3, 4, 5,6,7,8,9,],
'Salary': [7, 7, 7,8,9,10,11,12,13,14,15,
4, 5,6,7,8,
8,9,10,11,12,13,
8,1,2,3,4,5,6,7,
7, 7,9,10,11,12,13,14,15,],
'Children': ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'No','No', 'Yes', 'Yes', 'Yes', 'No',
'Yes', 'No', 'Yes', 'No', 'Yes',
'No','Yes', 'Yes', 'No','No', 'Yes',
'Yes','Yes', 'Yes', 'No','No', 'Yes', 'Yes', 'Yes',
'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No',],
'Days': [123, 128, 66, 66, 120, 141, 52,96, 120, 141, 52,
96, 120,128, 66, 120,
15,123, 128, 66, 120, 141,
141,128, 66, 123, 128, 66, 120,141,
123, 128, 66, 123, 128, 66, 120, 141, 52,],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
The dataframe looks as such:
Here, every id has different cycles as per the 'cycle' column. For example,
id-1 has maximum 11 cycles.
id-2 has maximum 5 cycles.
id-3 has maximum 6 cycles.
id-4 has maximum 8 cycles.
id-5 has maximum 9 cycles.
Now, for every id, I wish to delete rows from the bottom.
For example,
For id-1, delete last four rows.
For id-2, delete last two rows.
For id-3, delete last three rows.
For id-4, delete last five rows.
For id-5, delete last six rows.
The dataframe then looks as such:
Can somebody please let me know how do I achieve this task in Python?
答案1
得分: 1
创建一个字典来指定删除行的数量,并使用[`GroupBy.cumcount`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.cumcount.html)按照降序比较通过[`Series.map`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html)映射的`id`列,然后可以通过[`布尔索引`](http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)进行过滤:
d = {1:4,2:2,3:3,4:5,5:6}
df = df[df.groupby('id').cumcount(ascending=False).ge(df['id'].map(d))]
print (df)
id cycle Salary Children Days
0 1 1 7 No 123
1 1 2 7 Yes 128
2 1 3 7 Yes 66
3 1 4 8 Yes 66
4 1 5 9 Yes 120
5 1 6 10 No 141
6 1 7 11 No 52
11 2 1 4 Yes 96
12 2 2 5 No 120
13 2 3 6 Yes 128
16 3 1 8 No 15
17 3 2 9 Yes 123
18 3 3 10 Yes 128
22 4 1 8 Yes 141
23 4 2 1 Yes 128
24 4 3 2 Yes 66
30 5 1 7 No 123
31 5 2 7 Yes 128
32 5 3 9 No 66
英文:
Create dictionary for specify number of deleted rows and compare by counter from back by GroupBy.cumcount
with ascending=False
mapped id
column by Series.map
, so possible filter by boolean indexing
:
d = {1:4,2:2,3:3,4:5,5:6}
df = df[df.groupby('id').cumcount(ascending=False).ge(df['id'].map(d))]
print (df)
id cycle Salary Children Days
0 1 1 7 No 123
1 1 2 7 Yes 128
2 1 3 7 Yes 66
3 1 4 8 Yes 66
4 1 5 9 Yes 120
5 1 6 10 No 141
6 1 7 11 No 52
11 2 1 4 Yes 96
12 2 2 5 No 120
13 2 3 6 Yes 128
16 3 1 8 No 15
17 3 2 9 Yes 123
18 3 3 10 Yes 128
22 4 1 8 Yes 141
23 4 2 1 Yes 128
24 4 3 2 Yes 66
30 5 1 7 No 123
31 5 2 7 Yes 128
32 5 3 9 No 66
答案2
得分: 1
@jezrael 的方法是一个很好的捕捉,我将尝试以下方式。
它简单地刮擦每个子矩阵,进行推导,然后重新组合它们。
看起来有点啰嗦,但它遵循一个清晰的模式,您可以将其制作成一个更通用的函数。
使用这个函数,最左列的升序索引将被重新排序。
df1 = df[df['id'] == 1].iloc[:-4]
df2 = df[df['id'] == 2].iloc[:-2]
df3 = df[df['id'] == 3].iloc[:-3]
df4 = df[df['id'] == 4].iloc[:-5]
df5 = df[df['id'] == 5].iloc[:-6]
df = pd.concat([df1, df2, df3, df4, df5])
data = df.to_dict('list')
df = pd.DataFrame(data)
print("df = \n", df)
df =
id cycle Salary Children Days
0 1 1 7 No 123
1 1 2 7 Yes 128
2 1 3 7 Yes 66
3 1 4 8 Yes 66
4 1 5 9 Yes 120
5 1 6 10 No 141
6 1 7 11 No 52
7 2 1 4 Yes 96
8 2 2 5 No 120
9 2 3 6 Yes 128
10 3 1 8 No 15
11 3 2 9 Yes 123
12 3 3 10 Yes 128
13 4 1 8 Yes 141
14 4 2 1 Yes 128
15 4 3 2 Yes 66
16 5 1 7 No 123
17 5 2 7 Yes 128
18 5 3 9 No 66
英文:
@jezrael 's method is a great catch, I'll give my try as below.
It simply scratches each sub-matrix, makes the deduction, and recombines them.
It seems wordy but it follows a clear pattern and you can make it a function for more generic use.
By using this function, the ascending index at the very left column will be reordered.
df1 = df[df['id'] == 1].iloc[:-4]
df2 = df[df['id'] == 2].iloc[:-2]
df3 = df[df['id'] == 3].iloc[:-3]
df4 = df[df['id'] == 4].iloc[:-5]
df5 = df[df['id'] == 5].iloc[:-6]
df = pd.concat([df1, df2, df3, df4, df5])
data = df.to_dict('list')
df = pd.DataFrame(data)
print("df = \n", df)
df =
id cycle Salary Children Days
0 1 1 7 No 123
1 1 2 7 Yes 128
2 1 3 7 Yes 66
3 1 4 8 Yes 66
4 1 5 9 Yes 120
5 1 6 10 No 141
6 1 7 11 No 52
7 2 1 4 Yes 96
8 2 2 5 No 120
9 2 3 6 Yes 128
10 3 1 8 No 15
11 3 2 9 Yes 123
12 3 3 10 Yes 128
13 4 1 8 Yes 141
14 4 2 1 Yes 128
15 4 3 2 Yes 66
16 5 1 7 No 123
17 5 2 7 Yes 128
18 5 3 9 No 66
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论