英文:
how can i get a random sample from dataframe but have it contain a distribution of a variable? PYTHON
问题
df.sample(100, random_state = 20)  这部分是代码,不需要翻译。
英文:
context:
i have a large dataframe that looks similar to this but has 200k rows
| name | country | id | 
|---|---|---|
| neymar | brazil | 1234 | 
| ronaldo | portugal | 5678 | 
| benzema | france | 9012 | 
| t. silva | brazil | 3456 | 
i want to take a random sample of 100 from this dataframe but ensure i have a few from each country in the random sample - how could i do this? thanks in advance!!
df.sample(100, random_state = 20)
答案1
得分: 1
为了保留按国家分布,您可以使用 sklearn.utils.resample,将 stratify=df.country 设置为参数。
例如:
from sklearn.utils import resample
resample(df, n_samples=500, replace=False, stratify=df.country, random_state=123)
更多详细信息请查看 https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html
英文:
In order to preserve the distribution by country you could use sklearn.utils.resample setting stratify=df.country.
For example:
from sklearn.utils import resample
resample(df, n_samples=500, replace=False, stratify=df.country, random_state=123)
More details in https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论