2023年7月10日 21:50:43go评论103阅读模式

英文:

describe when data is value_counts

问题

我有一个包含两列数值和计数的数据框。例如，一行具有（value，count）=（2，1000）表示值2出现1000次。我想计算最小值、最大值、中位数和百分位数，使得结果与在数据未“分组”时的df.describe()相同。

谢谢。

英文:

I have a data frame that contains two columns value, count. For example a row that has (value, count)=(2, 1000) this means that the value 2 occurence 1000. I want to compute min, max, median, percentiles so that the results would be the same as df.describe() when the data is not "grouped"

Thank you

could not find anything

答案1

得分: 0

# 聚合数据
df = pd.DataFrame({'value': [1, 2, 3], 'count': [5, 1, 4]})
# 复制行并计算统计信息
out = df.loc[df.index.repeat(df['count']), 'value'].describe()

当然，你可以根据你想计算的具体统计信息做得更好：min/max 保持不变；mean 和 std 可以使用 numpy.average / statsmodels.stats.weightstats.DescrStatsW 以及它们的 weight 参数进行计算，等等。你需要自己判断你需要计算什么，并决定是否可以在不取消聚合的情况下进行计算。

输出：

count    10.000000
mean      1.900000
std       0.994429
min       1.000000
25%       1.000000
50%       1.500000
75%       3.000000
max       3.000000
Name: value, dtype: float64

英文:

The generic way would be to restore the original data, then compute the statistics:

# aggregated data
df = pd.DataFrame({&#39;value&#39;: [1, 2, 3], &#39;count&#39;: [5, 1, 4]})
# replicate rows and compute statistics
out = df.loc[df.index.repeat(df[&#39;count&#39;]), &#39;value&#39;].describe()

Of course, you can do better depending on which exact statistics you want to compute: min/max would be unchanged; mean and std could be computed using numpy.average/statsmodels.stats.weightstats.DescrStatsW and their weight parameter, etc. You have to see for yourself what you need to compute and decide if you can do so without unaggregating.

Output:

count    10.000000
mean      1.900000
std       0.994429
min       1.000000
25%       1.000000
50%       1.500000
75%       3.000000
max       3.000000
Name: value, dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

描述数据何时使用 `value_counts`。

问题

答案1

Tensorflow – 文本分类 – Shapes (None,) 和 (None, 250, 100) 不兼容错误

工作在请求上下文之外的错误 – 如何在普通函数中删除用户会话

我可以翻译这句话：”how can i know how many episode of a movie do i have in my file”。

如何在Python中实现类型安全的CRTP？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。