理解 pandas 的 .apply(axis=’columns’) 方法?

huangapple go评论87阅读模式
英文:

How to understand pandas .apply(axis='columns')?

问题

以下是我为您翻译的代码部分:

def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1

star_ratings_2 = reviews.apply(stars, axis='columns')

请注意,代码中的HTML实体(如>')已被正确翻译为相应的字符。至于您的问题,axis='columns' 之所以被使用,是因为stars() 函数是用于按行处理的,它针对每行数据中的countrypoints列进行评分。这里的 axis='columns' 表示我们要沿着每一行的列方向(即每一行)应用 stars() 函数,以便正确处理每一行的数据。这是正确的设置,与您的代码和问题一致。

英文:

Below is an answer code I received from Kaggle Pandas course.

def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1

star_ratings_2 = reviews.apply(stars, axis='columns')  

The question goes like this:

> We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

>Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

>Create a series star_ratings with the number of stars corresponding to each review in the dataset.

The dataset looks like this:
Table

My question is:
star_ratings_2 = reviews.apply(stars, axis='columns')
Why axis='columns instead of axis='rows'? since the stars() functions has to process country and points columns of a row, shouldn't we pass a row to the stars() function?

I just didn't expect the correct answer will be axis='columns', I ve asked around including ChatGPT, but there is no good answer for me. ChatGPT even think that I am right where the axis='rows' should be correct.

答案1

得分: 1

The terminology is maybe misleading. However the apply documentation is pretty clear:

> axis: {0 or ‘index’, 1 or ‘columns’}, default 0
>
> > Axis along which the function is applied:
> >
> > 0 or ‘index’: apply function to each column.
> >
> > 1 or ‘columns’: apply function to each row.

You can make the parallel with aggregation functions: df.sum(axis=1) takes each row and aggregates it into a single value. This is the same here: apply on axis=1/axis='columns' takes each row and performs something.

英文:

The terminology is maybe misleading. However the apply documentation is pretty clear:

> axis: {0 or ‘index’, 1 or ‘columns’}, default 0
>
> > Axis along which the function is applied:
> >
> > 0 or ‘index’: apply function to each column.
> >
> > 1 or ‘columns’: apply function to each row.

You can make the parallel with aggregation functions: df.sum(axis=1) takes each row and aggregates it into a single value. This is the same here: apply on axis=1/axis='columns' takes each row and performs something.

huangapple
  • 本文由 发表于 2023年4月13日 19:08:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定