问题

import pandas as pd
import numpy as np

# Your original DataFrame
df = pd.DataFrame({'id': [1, 2, 3, 4],
                   'name': ['tim', 'jim', 'john', 'bill']})

# Duplicate each row randomly 0-5 times
df = df.loc[df.index.repeat(np.random.randint(0, 6, len(df)))]

# Resetting index to maintain a clean DataFrame
df.reset_index(drop=True, inplace=True)

英文:

I have a data set like so:

Input:

id name
1  tim
2  jim
3  john
4  bill

I want to duplicate each row in my data set randomly anywhere from 0 - 5 times.

So my final data set might look something like this:

Output:

id name
1  tim
1  tim
2  jim
3  john
3  john
3  john
3  john
4  bill
4  bill
4  bill

how can i do this in python pandas?

答案1

得分: 1

你可以使用numpy.uniform生成一个随机重复数组，然后使用它来对索引进行repeat操作以进行索引：

out = df.loc[df.index.repeat(np.random.uniform(0, 5+1, size=len(df)))]

示例：

   id  name
0   1   tim
0   1   tim
0   1   tim
0   1   tim
0   1   tim
1   2   jim
1   2   jim
2   3  john
3   4  bill
3   4  bill
3   4  bill
3   4  bill

使用 np.random.seed(42) 的输出：

   id  name
0   1   tim
0   1   tim
1   2   jim
1   2   jim
1   2   jim
1   2   jim
1   2   jim
2   3  john
2   3  john
2   3  john
2   3  john
3   4  bill
3   4  bill
3   4  bill

英文:

You can generate a random array of repeats with numpy.uniform and use that to repeat your index for indexing:

out = df.loc[df.index.repeat(np.random.uniform(0, 5+1, size=len(df)))]

Example:

   id  name
0   1   tim
0   1   tim
0   1   tim
0   1   tim
0   1   tim
1   2   jim
1   2   jim
2   3  john
3   4  bill
3   4  bill
3   4  bill
3   4  bill

Output with np.random.seed(42):

   id  name
0   1   tim
0   1   tim
1   2   jim
1   2   jim
1   2   jim
1   2   jim
1   2   jim
2   3  john
2   3  john
2   3  john
2   3  john
3   4  bill
3   4  bill
3   4  bill

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在数据框中将每一行重复 N 次，其中 N 是随机的。

问题

答案1

将一个值替换为该值除以该值在 pandas 中存在的次数。

PyCharm运行一个Flask应用程序，但在Python 3.11中无法成功进行调试。

从.env文件中导出变量并从os.environ中获取它们，不使用python-dotenv。

创建列，如果所有后续年份都满足条件，则使用年份名称。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论