英文:
How to generate a pandas dataframe column with pre-determined values
问题
以下是您提供的代码的翻译:
# 使用以下代码创建了一个pandas数据框:
import pandas as pd
import numpy as np
ds = {'col1':[1,1,1,1,1,0,0,0]}
df = pd.DataFrame(data=ds)
# 数据框如下所示:
print(df)
col1
0 1
1 1
2 1
3 1
4 1
5 0
6 0
7 0
我需要创建一个新列,名为col2,满足以下条件:
1. 当`col1 = 1`时,我们有5条记录。对于其中的3条记录,`col2`必须等于2,对于其余2条记录,`col2`必须等于3。2和3的位置是随机的。
2. 当`col1 = 0`时,我们有3条记录。对于其中的2条记录,`col2`必须等于5,对于其余1条记录,`col2`必须等于6。5和6的位置是随机的。
生成的数据框如下所示(显然,`col2`中的值的位置是随机的,因此在尝试解决此问题时,可能会得到不同的记录位置,但`col2`中值的比例应满足上述指定的条件):
[![enter image description here][1]][1]
有谁知道如何在Python中完成这个任务?
[1]: https://i.stack.imgur.com/F11xy.png
英文:
I have created a pandas dataframe using this code:
import pandas as pd
import numpy as np
ds = {'col1':[1,1,1,1,1,0,0,0]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
print(df)
col1
0 1
1 1
2 1
3 1
4 1
5 0
6 0
7 0
I need to create a new column, called col2, subject to these conditions:
-
when
col1 = 1
, then we have 5 records. For 3 of those records,col2
must be equal to 2 and for the remaining 2 recordscol2
must be equal to 3. The location of the 2's and 3's is random. -
when
col1 = 0
, then we have 3 records. For 2 of those records,col2
must be equal to 5 and for the remaining recordcol2
must be equal to 6. The location of the 5's and 6 is random.
The resulting dataframe would look as follows (obviously the location of the values in col2
is random, so when you try to solve this you might get different record location, but the proportion of the values in col2
should meet the conditions specified above):
Does anyone know how to do this in python?
答案1
得分: 2
请看下面的中文翻译:
建议使用自定义函数以提高灵活性:
import numpy as np
mapper = {1: {2: 3, 3: 2}, 0: {5: 2, 6: 1}}
def repeat(g):
d = mapper[g.name]
a = np.repeat(list(d), list(d.values()))
np.random.shuffle(a)
return pd.Series(a, g.index)
df['col2'] = df.groupby('col1', group_keys=False).apply(repeat)
示例输出:
col1 col2
0 1 2
1 1 3
2 1 2
3 1 3
4 1 2
5 0 5
6 0 6
7 0 5
英文:
Better use a custom function for flexibility:
import numpy as np
mapper = {1: {2: 3, 3: 2}, 0: {5: 2, 6: 1}}
def repeat(g):
d = mapper[g.name]
a = np.repeat(list(d), list(d.values()))
np.random.shuffle(a)
return pd.Series(a, g.index)
df['col2'] = df.groupby('col1', group_keys=False).apply(repeat)
Example output:
col1 col2
0 1 2
1 1 3
2 1 2
3 1 3
4 1 2
5 0 5
6 0 6
7 0 5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论