Python pandas `sort_values` not working properly.

huangapple go评论74阅读模式
英文:

Python pandas sort_values not working properly

问题

当我尝试按列值对DataFrame进行排序并使用head()函数打印它时,它显示重复的行,而不是期望的结果。

regions = country_features['world_region']
happiness = []
counts = []
reg = []

for region in regions:
    hap = country_features.loc[country_features['world_region'] == region, 'happiness_score'].mean()
    count = len(country_features[country_features['world_region'] == region])
    happiness.append(hap)
    counts.append(count)
    reg.append(region)

region_happiness = pd.DataFrame({'region': reg,
                                'happiness_score': happiness,
                                'country_count': counts})

region_happiness
region_happiness.happiness_score = pd.to_numeric(region_happiness.happiness_score)
sorted_df = region_happiness.sort_values(by='happiness_score', ascending=False)

sorted_df.head(5)

我想按列值对DataFrame进行排序,我希望它能正确排序。

英文:

When I try to sort DataFrame by column value and print it white head() function it shows duplicated rows instead of desired result

regions = country_features['world_region']
happines = []
counts = []
reg = []

for region in regions:
    hap = country_features.loc[country_features['world_region'] == region, 'happiness_score'].mean()
    count = len(country_features[country_features['world_region'] == region])
    happines.append(hap)
    counts.append(count)
    reg.append(region)

region_happines = pd.DataFrame({'region':reg,
                                'happiness_score' : happines,
                                'country_count':counts})

region_happines
region_happines.happiness_score = pd.to_numeric(region_happines.happiness_score)
sorted = region_happines.sort_values(by='happiness_score', ascending=False)

sorted.head(5)

I want to sort DataFrame by column value and I expected it to be sorted correctly

答案1

得分: -1

第一部分的解决方案应该简化为:

print (country_features)
  world_region  happiness_score
0         reg1                5
1         reg1                1
2         reg2               10
3         reg2                1
4         reg2                3

region_happines = (country_features.groupby('world_region', as_index=False)
                                   .agg(happiness_score=('happiness_score', 'mean'),
                                        country_count=('happiness_score', 'size'))
                                   .rename(columns={'world_region': 'region'}))
print (region_happines)
  region  happiness_score  country_count
0   reg1         3.000000              2
1   reg2         4.666667              3

Because in column 'happiness_score' are averages per groups, not converted to numeric.

out = region_happines.sort_values(by='happiness_score', ascending=False)
英文:

First part of solution should be simplify:

print (country_features)
  world_region  happiness_score
0         reg1                5
1         reg1                1
2         reg2               10
3         reg2                1
4         reg2                3

region_happines = (country_features.groupby('world_region',as_index=False)
                                   .agg(happiness_score= ('happiness_score','mean'),
                                        country_count= ('happiness_score','size'))
                                   .rename(columns={'world_region':'region'}))
print (region_happines)
  region  happiness_score  country_count
0   reg1         3.000000              2
1   reg2         4.666667              3

Because in column happiness_score are averages per groups, not converted to numeric.

out = region_happines.sort_values(by='happiness_score', ascending=False)

huangapple
  • 本文由 发表于 2023年3月8日 18:30:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75671880.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定