生成带有预定值的 pandas 数据框列。

huangapple go评论63阅读模式
英文:

How to generate a pandas dataframe column with pre-determined values

问题

以下是您提供的代码的翻译:

# 使用以下代码创建了一个pandas数据框:

import pandas as pd
import numpy as np

ds = {'col1':[1,1,1,1,1,0,0,0]}

df = pd.DataFrame(data=ds)

# 数据框如下所示:

print(df)
   col1
0     1
1     1
2     1
3     1
4     1
5     0
6     0
7     0

我需要创建一个新列名为col2满足以下条件

1.`col1 = 1`我们有5条记录对于其中的3条记录,`col2`必须等于2对于其余2条记录,`col2`必须等于32和3的位置是随机的

2.`col1 = 0`我们有3条记录对于其中的2条记录,`col2`必须等于5对于其余1条记录,`col2`必须等于65和6的位置是随机的

生成的数据框如下所示显然,`col2`中的值的位置是随机的因此在尝试解决此问题时可能会得到不同的记录位置`col2`中值的比例应满足上述指定的条件):

[![enter image description here][1]][1]

有谁知道如何在Python中完成这个任务

  [1]: https://i.stack.imgur.com/F11xy.png
英文:

I have created a pandas dataframe using this code:

import pandas as pd
import numpy as np

ds = {'col1':[1,1,1,1,1,0,0,0]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)
   col1
0     1
1     1
2     1
3     1
4     1
5     0
6     0
7     0

I need to create a new column, called col2, subject to these conditions:

  1. when col1 = 1, then we have 5 records. For 3 of those records, col2 must be equal to 2 and for the remaining 2 records col2 must be equal to 3. The location of the 2's and 3's is random.

  2. when col1 = 0, then we have 3 records. For 2 of those records, col2 must be equal to 5 and for the remaining record col2 must be equal to 6. The location of the 5's and 6 is random.

The resulting dataframe would look as follows (obviously the location of the values in col2 is random, so when you try to solve this you might get different record location, but the proportion of the values in col2 should meet the conditions specified above):

生成带有预定值的 pandas 数据框列。

Does anyone know how to do this in python?

答案1

得分: 2

请看下面的中文翻译:

建议使用自定义函数以提高灵活性:

import numpy as np

mapper = {1: {2: 3, 3: 2}, 0: {5: 2, 6: 1}}

def repeat(g):
    d = mapper[g.name]
    a = np.repeat(list(d), list(d.values()))
    np.random.shuffle(a)
    return pd.Series(a, g.index)

df['col2'] = df.groupby('col1', group_keys=False).apply(repeat)

示例输出:

   col1  col2
0     1     2
1     1     3
2     1     2
3     1     3
4     1     2
5     0     5
6     0     6
7     0     5
英文:

Better use a custom function for flexibility:

import numpy as np

mapper = {1: {2: 3, 3: 2}, 0: {5: 2, 6: 1}}

def repeat(g):
    d = mapper[g.name]
    a = np.repeat(list(d), list(d.values()))
    np.random.shuffle(a)
    return pd.Series(a, g.index)

df['col2'] = df.groupby('col1', group_keys=False).apply(repeat)

Example output:

   col1  col2
0     1     2
1     1     3
2     1     2
3     1     3
4     1     2
5     0     5
6     0     6
7     0     5

huangapple
  • 本文由 发表于 2023年3月7日 17:42:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660242.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定