2023年4月13日 19:08:05go评论138阅读模式

英文:

How to understand pandas .apply(axis='columns')?

问题

以下是我为您翻译的代码部分：

def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1
star_ratings_2 = reviews.apply(stars, axis='columns')

请注意，代码中的HTML实体（如>和'）已被正确翻译为相应的字符。至于您的问题，axis='columns' 之所以被使用，是因为stars() 函数是用于按行处理的，它针对每行数据中的country和points列进行评分。这里的 axis='columns' 表示我们要沿着每一行的列方向（即每一行）应用 stars() 函数，以便正确处理每一行的数据。这是正确的设置，与您的代码和问题一致。

英文:

Below is an answer code I received from Kaggle Pandas course.

def stars(row):
    if row.country == &#39;Canada&#39;:
        return 3
    elif row.points &gt;= 95:
        return 3
    elif row.points &gt;= 85:
        return 2
    else:
        return 1
star_ratings_2 = reviews.apply(stars, axis=&#39;columns&#39;)

The question goes like this:

> We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

>Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

>Create a series star_ratings with the number of stars corresponding to each review in the dataset.

The dataset looks like this:
Table

My question is:
star_ratings_2 = reviews.apply(stars, axis='columns')
Why axis='columns instead of axis='rows'? since the stars() functions has to process country and points columns of a row, shouldn't we pass a row to the stars() function?

I just didn't expect the correct answer will be axis='columns', I ve asked around including ChatGPT, but there is no good answer for me. ChatGPT even think that I am right where the axis='rows' should be correct.

答案1

得分: 1

The terminology is maybe misleading. However the apply documentation is pretty clear:

> axis: {0 or ‘index’, 1 or ‘columns’}, default 0
>
> > Axis along which the function is applied:
> >
> > 0 or ‘index’: apply function to each column.
> >
> > 1 or ‘columns’: apply function to each row.

You can make the parallel with aggregation functions: df.sum(axis=1) takes each row and aggregates it into a single value. This is the same here: apply on axis=1/axis='columns' takes each row and performs something.

英文:

The terminology is maybe misleading. However the apply documentation is pretty clear:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

理解 pandas 的 .apply(axis=’columns’) 方法？

问题

答案1

如何将我的Telegram机器人（telebot）连接到PostgreSQL URL

在类内部使用静态方法时没有输出（该类内部没有主函数）

在Python中执行操作以创建多个子列表

获取异步任务中第一个非空结果的方法

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。