英文:
how can i get a random sample from dataframe but have it contain a distribution of a variable? PYTHON
问题
df.sample(100, random_state = 20)
这部分是代码,不需要翻译。
英文:
context:
i have a large dataframe that looks similar to this but has 200k rows
name | country | id |
---|---|---|
neymar | brazil | 1234 |
ronaldo | portugal | 5678 |
benzema | france | 9012 |
t. silva | brazil | 3456 |
i want to take a random sample of 100 from this dataframe but ensure i have a few from each country in the random sample - how could i do this? thanks in advance!!
df.sample(100, random_state = 20)
答案1
得分: 1
为了保留按国家分布,您可以使用 sklearn.utils.resample
,将 stratify=df.country
设置为参数。
例如:
from sklearn.utils import resample
resample(df, n_samples=500, replace=False, stratify=df.country, random_state=123)
更多详细信息请查看 https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html
英文:
In order to preserve the distribution by country you could use sklearn.utils.resample
setting stratify=df.country
.
For example:
from sklearn.utils import resample
resample(df, n_samples=500, replace=False, stratify=df.country, random_state=123)
More details in https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论