描述数据何时使用 `value_counts`。

huangapple go评论79阅读模式
英文:

describe when data is value_counts

问题

我有一个包含两列数值和计数的数据框。例如,一行具有(value,count)=(2,1000)表示值2出现1000次。我想计算最小值、最大值、中位数和百分位数,使得结果与在数据未“分组”时的df.describe()相同。

谢谢。

英文:

I have a data frame that contains two columns value, count. For example a row that has (value, count)=(2, 1000) this means that the value 2 occurence 1000. I want to compute min, max, median, percentiles so that the results would be the same as df.describe() when the data is not "grouped"

Thank you

could not find anything

答案1

得分: 0

# 聚合数据
df = pd.DataFrame({'value': [1, 2, 3], 'count': [5, 1, 4]})

# 复制行并计算统计信息
out = df.loc[df.index.repeat(df['count']), 'value'].describe()

当然,你可以根据你想计算的具体统计信息做得更好:min/max 保持不变;meanstd 可以使用 numpy.average / statsmodels.stats.weightstats.DescrStatsW 以及它们的 weight 参数进行计算,等等。你需要自己判断你需要计算什么,并决定是否可以在不取消聚合的情况下进行计算。

输出:

count    10.000000
mean      1.900000
std       0.994429
min       1.000000
25%       1.000000
50%       1.500000
75%       3.000000
max       3.000000
Name: value, dtype: float64
英文:

The generic way would be to restore the original data, then compute the statistics:

# aggregated data
df = pd.DataFrame({'value': [1, 2, 3], 'count': [5, 1, 4]})

# replicate rows and compute statistics
out = df.loc[df.index.repeat(df['count']), 'value'].describe()

Of course, you can do better depending on which exact statistics you want to compute: min/max would be unchanged; mean and std could be computed using numpy.average/statsmodels.stats.weightstats.DescrStatsW and their weight parameter, etc. You have to see for yourself what you need to compute and decide if you can do so without unaggregating.

Output:

count    10.000000
mean      1.900000
std       0.994429
min       1.000000
25%       1.000000
50%       1.500000
75%       3.000000
max       3.000000
Name: value, dtype: float64

huangapple
  • 本文由 发表于 2023年7月10日 21:50:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654424.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定