how can i get a random sample from dataframe but have it contain a distribution of a variable? PYTHON

huangapple go评论84阅读模式
英文:

how can i get a random sample from dataframe but have it contain a distribution of a variable? PYTHON

问题

df.sample(100, random_state = 20) 这部分是代码,不需要翻译。

英文:

context:
i have a large dataframe that looks similar to this but has 200k rows

name country id
neymar brazil 1234
ronaldo portugal 5678
benzema france 9012
t. silva brazil 3456

i want to take a random sample of 100 from this dataframe but ensure i have a few from each country in the random sample - how could i do this? thanks in advance!!

df.sample(100, random_state = 20)

答案1

得分: 1

为了保留按国家分布,您可以使用 sklearn.utils.resample,将 stratify=df.country 设置为参数。

例如:

from sklearn.utils import resample

resample(df, n_samples=500, replace=False, stratify=df.country, random_state=123)

更多详细信息请查看 https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html

英文:

In order to preserve the distribution by country you could use sklearn.utils.resample setting stratify=df.country.

For example:

from sklearn.utils import resample

resample(df, n_samples=500, replace=False, stratify=df.country, random_state=123)

More details in https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html

huangapple
  • 本文由 发表于 2023年3月4日 04:54:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75631784.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定