2023年4月7日 01:46:27go评论79阅读模式

英文:

removing rows of groups using a condition based on another row

问题

我有一个电影数据表格，看起来像这样：

    索引   电影                      烂番茄评分   导演                  排名
    1    蝙蝠侠崛起               87                克里斯托弗·诺兰    1
    2    星际穿越                  73                克里斯托弗·诺兰    2
    3    金发法宝                  71                罗伯特·卢克蒂克     1
    4    丑陋的真相               14                罗伯特·卢克蒂克     2
    5    (500)天与夏              85                马克·韦布           1
    6    超凡蜘蛛侠               71                马克·韦布           2
    7    醒来                       45                M·N·夏马兰        1
    8    最后的气宗                 5                   M·N·夏马兰        2

我已经添加了一个排名列，显示了基于烂番茄评分的排名顺序。这是按照电影的导演分组的。

我想要做的是，使用排名列，如果顶级电影的评分超过某个点（50%），则删除排名较低的电影。例如，对于马克·韦布，我只想显示他的电影(500)天与夏，但是我希望显示M·N·夏马兰的两部电影。理想的表格应该是这样的：

    索引   电影                      烂番茄评分   导演                  排名
    1    蝙蝠侠崛起               87                克里斯托弗·诺兰    1
    3    金发法宝                  71                罗伯特·卢克蒂克     1
    5    (500)天与夏              85                马克·韦布           1
    7    醒来                       45                M·N·夏马兰        1
    8    最后的气宗                 5                   M·N·夏马兰        2

我尝试过：

    movie_names = movie_names.groupby('Movie').filter(lambda g: (g.score <= 0.5).any())

但这样会删除M·N·夏马兰的两部电影。有人知道如何做到这一点吗？任何帮助将不胜感激！

英文:

I have a dataframe of movies that looks like this:

Index  Movie                   Rotten Tomato Score  Director             Ranking
1      Batman Rises            87                   Christopher Nolan    1
2      Interstellar            73                   Christopher Nolan    2
3      Legally Blonde          71                   Robert Luketic       1
4      The Ugly Truth          14                   Robert Luketic       2
5      (500) Days of Summer    85                   Marc Webb            1
6      The Amazing Spider-Man  71                   Marc Webb            2
7      Wide Awake              45                   M N Shyamalan        1
8      The Last Airbender       5                   M N Shyamalan        2

I have made a ranking column which shows the order of ranking based off of the rotten tomato percentage. This is grouped by Director of the movie.

What I am trying to do is to use the ranking column, to drop the lower ranked movies if the top ranked movie is scored above a certain point (50%). For example, For Marc Webb, the only film I want showing for him is (500) Days of Summer, however I would want to show both the movies for M N Shyamalan. The ideal table would look like:

Index  Movie                 Rotten Tomato Score  Director             Ranking  
    1  Batman Rises          87                   Christopher Nolan    1      
    3  Legally Blonde        71                   Robert Luketic       1
    5  (500) Days of Summer  85                   Marc Webb            1
    7  Wide Awake            45                   M N Shyamalan        1
    8  The Last Airbender    5                    M N Shyamalan        2

I have attempted:

movie_names = movie_names.groupby(&#39;Movie&#39;).filter(lambda g: (g.score &lt;= 0.5).any())

However that removed both of M N Shyamalan's movies.

Would anyone know how to do this? Any help would be greatly appreciated!

答案1

得分: 2

代码

# 电影评分是否大于50？
m = df['Rotten Tomato Score'] > 50

# 导演是否至少有一部电影的评分大于50？
cond1 = m.groupby(df['Director']).transform('any')

# 标记重复行并保留评分最高的电影
cond2 = df.sort_values('Rotten Tomato Score').duplicated('Director', keep='last')

# 当满足cond1和cond2时删除行
df[~(cond1 & cond2)]

结果

       Index                 Movie  Rotten Tomato Score           Director  Ranking
    0      1          Batman Rises                   87  Christopher Nolan        1
    2      3        Legally Blonde                   71     Robert Luketic        1
    4      5  (500) Days of Summer                   85          Marc Webb        1
    6      7            Wide Awake                   45      M N Shyamalan        1
    7      8    The Last Airbender                    5      M N Shyamalan        2

英文:

Code

# is movie score &gt; 50?
m = df[&#39;Rotten Tomato Score&#39;] &gt; 50

# Does director has at least one movie with score &gt; 50?
cond1 = m.groupby(df[&#39;Director&#39;]).transform(&#39;any&#39;)

# flag the duplicate rows and keep the movie with highest score
cond2 = df.sort_values(&#39;Rotten Tomato Score&#39;).duplicated(&#39;Director&#39;, keep=&#39;last&#39;)

# Drop rows when cond1 and cond2 is met
df[~(cond1 &amp; cond2)]

Result

   Index                 Movie  Rotten Tomato Score           Director  Ranking
0      1          Batman Rises                   87  Christopher Nolan        1
2      3        Legally Blonde                   71     Robert Luketic        1
4      5  (500) Days of Summer                   85          Marc Webb        1
6      7            Wide Awake                   45      M N Shyamalan        1
7      8    The Last Airbender                    5      M N Shyamalan        2

答案2

得分: 1

import pandas as pd

df=pd.DataFrame({'Index':[1, 2, 3, 4, 5, 6, 7, 8],
                 'Movie':['Batman Rises', 'Interstellar', 'Legally Blonde', 'The Ugly Truth', 
                          '(500) Days of Summer', 'The Amazing Spider-Man', 'Wide Awake', 'The Last Airbender'],
                 'Rotten Tomato Score':[87, 73, 71, 14, 85, 71, 45, 5],
                 'Director':['Christopher Nolan', 'Christopher Nolan', 'Robert Luketic', 'Robert Luketic', 
                             'Marc Webb', 'Marc Webb', 'M N Shyamalan', 'M N Shyamalan'],
                 'Ranking':[1, 2, 1, 2, 1, 2, 1, 2],
                 })
    
r = (df.groupby('Director')
       .apply(lambda g: (g.sort_values('Rotten Tomato Score')
                          .drop_duplicates(subset='Director', keep='last')) \
                          if g['Rotten Tomato Score'].max() > 50 else g
                         )
       .reset_index(drop=True)
       )

print(r)

英文:

import pandas as pd

df=pd.DataFrame({&#39;Index&#39;:[1, 2, 3, 4, 5, 6, 7, 8],
                 &#39;Movie&#39;:[&quot;Batman Rises&quot;, &quot;Interstellar&quot;, &quot;Legally Blonde&quot;, &quot;The Ugly Truth&quot;, 
                          &quot;(500) Days of Summer&quot;, &quot;The Amazing Spider-Man&quot;, &quot;Wide Awake&quot;, &quot;The Last Airbender&quot;],
                 &#39;Rotten Tomato Score&#39;:[87, 73, 71, 14, 85, 71, 45, 5],
                 &#39;Director&#39;:[&quot;Christopher Nolan&quot;, &quot;Christopher Nolan&quot;, &quot;Robert Luketic&quot;, &quot;Robert Luketic&quot;, 
                             &quot;Marc Webb&quot;, &quot;Marc Webb&quot;, &quot;M N Shyamalan&quot;, &quot;M N Shyamalan&quot;],
                 &#39;Ranking&#39;:[1, 2, 1, 2, 1, 2, 1, 2],
                 })
    
r = (df.groupby(&#39;Director&#39;)
       .apply(lambda g: (g.sort_values(&#39;Rotten Tomato Score&#39;)
                          .drop_duplicates(subset=&#39;Director&#39;, keep=&#39;last&#39;)) \
                          if g[&#39;Rotten Tomato Score&#39;].max() &gt; 50 else g
                         )
       .reset_index(drop=True)
       )

print(r)

Result

   Index                 Movie  Rotten Tomato Score           Director  Ranking
0      1          Batman Rises                   87  Christopher Nolan        1
1      7            Wide Awake                   45      M N Shyamalan        1
2      8    The Last Airbender                    5      M N Shyamalan        2
3      5  (500) Days of Summer                   85          Marc Webb        1
4      3        Legally Blonde                   71     Robert Luketic        1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用基于另一行的条件来删除分组的行

问题

答案1

代码

结果

Code

Result

答案2

Django的Group By聚合在添加额外字段之前是有效的

Golang测试中的固定装置

有没有用于身份验证用户删除的Python函数？

Django REST Framework JSON API show an empty object of relationships link when use relations.HyperlinkedRelatedField from rest_framework_json_api

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论