英文:
Duplicating every row in dataframe N times where N is random
问题
import pandas as pd
import numpy as np
# Your original DataFrame
df = pd.DataFrame({'id': [1, 2, 3, 4],
'name': ['tim', 'jim', 'john', 'bill']})
# Duplicate each row randomly 0-5 times
df = df.loc[df.index.repeat(np.random.randint(0, 6, len(df)))]
# Resetting index to maintain a clean DataFrame
df.reset_index(drop=True, inplace=True)
英文:
I have a data set like so:
Input:
id name
1 tim
2 jim
3 john
4 bill
I want to duplicate each row in my data set randomly anywhere from 0 - 5 times.
So my final data set might look something like this:
Output:
id name
1 tim
1 tim
2 jim
3 john
3 john
3 john
3 john
4 bill
4 bill
4 bill
how can i do this in python pandas?
答案1
得分: 1
你可以使用numpy.uniform
生成一个随机重复数组,然后使用它来对索引进行repeat
操作以进行索引:
out = df.loc[df.index.repeat(np.random.uniform(0, 5+1, size=len(df)))]
示例:
id name
0 1 tim
0 1 tim
0 1 tim
0 1 tim
0 1 tim
1 2 jim
1 2 jim
2 3 john
3 4 bill
3 4 bill
3 4 bill
3 4 bill
使用 np.random.seed(42)
的输出:
id name
0 1 tim
0 1 tim
1 2 jim
1 2 jim
1 2 jim
1 2 jim
1 2 jim
2 3 john
2 3 john
2 3 john
2 3 john
3 4 bill
3 4 bill
3 4 bill
英文:
You can generate a random array of repeats with numpy.uniform
and use that to repeat
your index for indexing:
out = df.loc[df.index.repeat(np.random.uniform(0, 5+1, size=len(df)))]
Example:
id name
0 1 tim
0 1 tim
0 1 tim
0 1 tim
0 1 tim
1 2 jim
1 2 jim
2 3 john
3 4 bill
3 4 bill
3 4 bill
3 4 bill
Output with np.random.seed(42)
:
id name
0 1 tim
0 1 tim
1 2 jim
1 2 jim
1 2 jim
1 2 jim
1 2 jim
2 3 john
2 3 john
2 3 john
2 3 john
3 4 bill
3 4 bill
3 4 bill
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论