英文:
How to understand pandas .apply(axis='columns')?
问题
以下是我为您翻译的代码部分:
def stars(row):
if row.country == 'Canada':
return 3
elif row.points >= 95:
return 3
elif row.points >= 85:
return 2
else:
return 1
star_ratings_2 = reviews.apply(stars, axis='columns')
请注意,代码中的HTML实体(如>
和'
)已被正确翻译为相应的字符。至于您的问题,axis='columns'
之所以被使用,是因为stars()
函数是用于按行处理的,它针对每行数据中的country
和points
列进行评分。这里的 axis='columns'
表示我们要沿着每一行的列方向(即每一行)应用 stars()
函数,以便正确处理每一行的数据。这是正确的设置,与您的代码和问题一致。
英文:
Below is an answer code I received from Kaggle Pandas course.
def stars(row):
if row.country == 'Canada':
return 3
elif row.points >= 95:
return 3
elif row.points >= 85:
return 2
else:
return 1
star_ratings_2 = reviews.apply(stars, axis='columns')
The question goes like this:
> We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.
>Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.
>Create a series star_ratings with the number of stars corresponding to each review in the dataset.
The dataset looks like this:
Table
My question is:
star_ratings_2 = reviews.apply(stars, axis='columns')
Why axis='columns
instead of axis='rows'
? since the stars()
functions has to process country
and points
columns of a row, shouldn't we pass a row to the stars()
function?
I just didn't expect the correct answer will be axis='columns'
, I ve asked around including ChatGPT, but there is no good answer for me. ChatGPT even think that I am right where the axis='rows'
should be correct.
答案1
得分: 1
The terminology is maybe misleading. However the apply
documentation is pretty clear:
> axis: {0 or ‘index’, 1 or ‘columns’}, default 0
>
> > Axis along which the function is applied:
> >
> > 0 or ‘index’: apply function to each column.
> >
> > 1 or ‘columns’: apply function to each row.
You can make the parallel with aggregation functions: df.sum(axis=1)
takes each row and aggregates it into a single value. This is the same here: apply
on axis=1
/axis='columns'
takes each row and performs something.
英文:
The terminology is maybe misleading. However the apply
documentation is pretty clear:
> axis: {0 or ‘index’, 1 or ‘columns’}, default 0
>
> > Axis along which the function is applied:
> >
> > 0 or ‘index’: apply function to each column.
> >
> > 1 or ‘columns’: apply function to each row.
You can make the parallel with aggregation functions: df.sum(axis=1)
takes each row and aggregates it into a single value. This is the same here: apply
on axis=1
/axis='columns'
takes each row and performs something.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论