英文:
removing rows of groups using a condition based on another row
问题
我有一个电影数据表格,看起来像这样:
索引 电影 烂番茄评分 导演 排名
1 蝙蝠侠崛起 87 克里斯托弗·诺兰 1
2 星际穿越 73 克里斯托弗·诺兰 2
3 金发法宝 71 罗伯特·卢克蒂克 1
4 丑陋的真相 14 罗伯特·卢克蒂克 2
5 (500)天与夏 85 马克·韦布 1
6 超凡蜘蛛侠 71 马克·韦布 2
7 醒来 45 M·N·夏马兰 1
8 最后的气宗 5 M·N·夏马兰 2
我已经添加了一个排名列,显示了基于烂番茄评分的排名顺序。这是按照电影的导演分组的。
我想要做的是,使用排名列,如果顶级电影的评分超过某个点(50%),则删除排名较低的电影。例如,对于马克·韦布,我只想显示他的电影(500)天与夏,但是我希望显示M·N·夏马兰的两部电影。理想的表格应该是这样的:
索引 电影 烂番茄评分 导演 排名
1 蝙蝠侠崛起 87 克里斯托弗·诺兰 1
3 金发法宝 71 罗伯特·卢克蒂克 1
5 (500)天与夏 85 马克·韦布 1
7 醒来 45 M·N·夏马兰 1
8 最后的气宗 5 M·N·夏马兰 2
我尝试过:
movie_names = movie_names.groupby('Movie').filter(lambda g: (g.score <= 0.5).any())
但这样会删除M·N·夏马兰的两部电影。有人知道如何做到这一点吗?任何帮助将不胜感激!
英文:
I have a dataframe of movies that looks like this:
Index Movie Rotten Tomato Score Director Ranking
1 Batman Rises 87 Christopher Nolan 1
2 Interstellar 73 Christopher Nolan 2
3 Legally Blonde 71 Robert Luketic 1
4 The Ugly Truth 14 Robert Luketic 2
5 (500) Days of Summer 85 Marc Webb 1
6 The Amazing Spider-Man 71 Marc Webb 2
7 Wide Awake 45 M N Shyamalan 1
8 The Last Airbender 5 M N Shyamalan 2
I have made a ranking column which shows the order of ranking based off of the rotten tomato percentage. This is grouped by Director of the movie.
What I am trying to do is to use the ranking column, to drop the lower ranked movies if the top ranked movie is scored above a certain point (50%). For example, For Marc Webb, the only film I want showing for him is (500) Days of Summer, however I would want to show both the movies for M N Shyamalan. The ideal table would look like:
Index Movie Rotten Tomato Score Director Ranking
1 Batman Rises 87 Christopher Nolan 1
3 Legally Blonde 71 Robert Luketic 1
5 (500) Days of Summer 85 Marc Webb 1
7 Wide Awake 45 M N Shyamalan 1
8 The Last Airbender 5 M N Shyamalan 2
I have attempted:
movie_names = movie_names.groupby('Movie').filter(lambda g: (g.score <= 0.5).any())
However that removed both of M N Shyamalan's movies.
Would anyone know how to do this? Any help would be greatly appreciated!
答案1
得分: 2
代码
# 电影评分是否大于50?
m = df['Rotten Tomato Score'] > 50
# 导演是否至少有一部电影的评分大于50?
cond1 = m.groupby(df['Director']).transform('any')
# 标记重复行并保留评分最高的电影
cond2 = df.sort_values('Rotten Tomato Score').duplicated('Director', keep='last')
# 当满足cond1和cond2时删除行
df[~(cond1 & cond2)]
结果
Index Movie Rotten Tomato Score Director Ranking
0 1 Batman Rises 87 Christopher Nolan 1
2 3 Legally Blonde 71 Robert Luketic 1
4 5 (500) Days of Summer 85 Marc Webb 1
6 7 Wide Awake 45 M N Shyamalan 1
7 8 The Last Airbender 5 M N Shyamalan 2
英文:
Code
# is movie score > 50?
m = df['Rotten Tomato Score'] > 50
# Does director has at least one movie with score > 50?
cond1 = m.groupby(df['Director']).transform('any')
# flag the duplicate rows and keep the movie with highest score
cond2 = df.sort_values('Rotten Tomato Score').duplicated('Director', keep='last')
# Drop rows when cond1 and cond2 is met
df[~(cond1 & cond2)]
Result
Index Movie Rotten Tomato Score Director Ranking
0 1 Batman Rises 87 Christopher Nolan 1
2 3 Legally Blonde 71 Robert Luketic 1
4 5 (500) Days of Summer 85 Marc Webb 1
6 7 Wide Awake 45 M N Shyamalan 1
7 8 The Last Airbender 5 M N Shyamalan 2
答案2
得分: 1
import pandas as pd
df=pd.DataFrame({'Index':[1, 2, 3, 4, 5, 6, 7, 8],
'Movie':['Batman Rises', 'Interstellar', 'Legally Blonde', 'The Ugly Truth',
'(500) Days of Summer', 'The Amazing Spider-Man', 'Wide Awake', 'The Last Airbender'],
'Rotten Tomato Score':[87, 73, 71, 14, 85, 71, 45, 5],
'Director':['Christopher Nolan', 'Christopher Nolan', 'Robert Luketic', 'Robert Luketic',
'Marc Webb', 'Marc Webb', 'M N Shyamalan', 'M N Shyamalan'],
'Ranking':[1, 2, 1, 2, 1, 2, 1, 2],
})
r = (df.groupby('Director')
.apply(lambda g: (g.sort_values('Rotten Tomato Score')
.drop_duplicates(subset='Director', keep='last')) \
if g['Rotten Tomato Score'].max() > 50 else g
)
.reset_index(drop=True)
)
print(r)
英文:
import pandas as pd
df=pd.DataFrame({'Index':[1, 2, 3, 4, 5, 6, 7, 8],
'Movie':["Batman Rises", "Interstellar", "Legally Blonde", "The Ugly Truth",
"(500) Days of Summer", "The Amazing Spider-Man", "Wide Awake", "The Last Airbender"],
'Rotten Tomato Score':[87, 73, 71, 14, 85, 71, 45, 5],
'Director':["Christopher Nolan", "Christopher Nolan", "Robert Luketic", "Robert Luketic",
"Marc Webb", "Marc Webb", "M N Shyamalan", "M N Shyamalan"],
'Ranking':[1, 2, 1, 2, 1, 2, 1, 2],
})
r = (df.groupby('Director')
.apply(lambda g: (g.sort_values('Rotten Tomato Score')
.drop_duplicates(subset='Director', keep='last')) \
if g['Rotten Tomato Score'].max() > 50 else g
)
.reset_index(drop=True)
)
print(r)
Result
Index Movie Rotten Tomato Score Director Ranking
0 1 Batman Rises 87 Christopher Nolan 1
1 7 Wide Awake 45 M N Shyamalan 1
2 8 The Last Airbender 5 M N Shyamalan 2
3 5 (500) Days of Summer 85 Marc Webb 1
4 3 Legally Blonde 71 Robert Luketic 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论