问题

以下是您提供的代码的翻译：

# 使用以下代码创建了一个pandas数据框：

import pandas as pd
import numpy as np

ds = {'col1':[1,1,1,1,1,0,0,0]}

df = pd.DataFrame(data=ds)

# 数据框如下所示：

print(df)
   col1
0     1
1     1
2     1
3     1
4     1
5     0
6     0
7     0

我需要创建一个新列，名为col2，满足以下条件：

1. 当`col1 = 1`时，我们有5条记录。对于其中的3条记录，`col2`必须等于2，对于其余2条记录，`col2`必须等于3。2和3的位置是随机的。

2. 当`col1 = 0`时，我们有3条记录。对于其中的2条记录，`col2`必须等于5，对于其余1条记录，`col2`必须等于6。5和6的位置是随机的。

生成的数据框如下所示（显然，`col2`中的值的位置是随机的，因此在尝试解决此问题时，可能会得到不同的记录位置，但`col2`中值的比例应满足上述指定的条件）：

[![enter image description here][1]][1]

有谁知道如何在Python中完成这个任务？

  [1]: https://i.stack.imgur.com/F11xy.png

英文:

I have created a pandas dataframe using this code:

import pandas as pd
import numpy as np

ds = {&#39;col1&#39;:[1,1,1,1,1,0,0,0]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

I need to create a new column, called col2, subject to these conditions:

when col1 = 1, then we have 5 records. For 3 of those records, col2 must be equal to 2 and for the remaining 2 records col2 must be equal to 3. The location of the 2's and 3's is random.
when col1 = 0, then we have 3 records. For 2 of those records, col2 must be equal to 5 and for the remaining record col2 must be equal to 6. The location of the 5's and 6 is random.

The resulting dataframe would look as follows (obviously the location of the values in col2 is random, so when you try to solve this you might get different record location, but the proportion of the values in col2 should meet the conditions specified above):

Does anyone know how to do this in python?

答案1

得分: 2

请看下面的中文翻译：

建议使用自定义函数以提高灵活性：

import numpy as np

mapper = {1: {2: 3, 3: 2}, 0: {5: 2, 6: 1}}

def repeat(g):
    d = mapper[g.name]
    a = np.repeat(list(d), list(d.values()))
    np.random.shuffle(a)
    return pd.Series(a, g.index)

df['col2'] = df.groupby('col1', group_keys=False).apply(repeat)

示例输出：

   col1  col2
0     1     2
1     1     3
2     1     2
3     1     3
4     1     2
5     0     5
6     0     6
7     0     5

英文:

Better use a custom function for flexibility:

import numpy as np

mapper = {1: {2: 3, 3: 2}, 0: {5: 2, 6: 1}}

def repeat(g):
    d = mapper[g.name]
    a = np.repeat(list(d), list(d.values()))
    np.random.shuffle(a)
    return pd.Series(a, g.index)

df[&#39;col2&#39;] = df.groupby(&#39;col1&#39;, group_keys=False).apply(repeat)

Example output:

   col1  col2
0     1     2
1     1     3
2     1     2
3     1     3
4     1     2
5     0     5
6     0     6
7     0     5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

生成带有预定值的 pandas 数据框列。

问题

答案1

如何在Python Pandas中更改`df.plot()`的背景颜色？

Comparing 2 excel files to extract rows based on a reference number in one file and copy them to a new file

找到值在一行基于另一行发生变化时。

如何在VSCode中使用pandas绘制时间序列数据

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论