英文:
How to randomly sample a value within a given segment?
问题
我想创建一个新列"sample_group_B",从与组A相同的段中随机抽取组B的购买价格数值。我该如何在pandas中实现这一目标?
我尝试过使用np.random()
,但它返回了一堆NaN值。
英文:
I want to create a new column "sample_group_B" which randomly samples a purchase price value from group B within the same segment of group A. How do I do this in pandas?
segment | purchase price | group
High | 100 | A
High | 105 | A
High | 103 | B
High | 104 | B
Low | 10 | A
Low | 9 | B
Low | 50 | B
Low | 55 | B
I want to create a new column that randomly samples the purchase price of group B within the respective segment such as:
segment | purchase price | group | sample_group_B
High | 100 | A | sample a value from (103 or 104)
High | 105 | A | sample a value from (103 or 104)
Low | 10 | A | sample a value from (9 or 50 or 55)
I tried np.random() but it returned a bunch of Nans.
答案1
得分: 0
# 注释的代码
from random import choice
# 过滤出A、B组
A = df.query("group == 'A'")
B = df.query("group == 'B")
# 创建一个映射字典,列出给定分段的所有购买价格
d = B.groupby('segment')['purchase price'].agg(list)
# 将A中的分段与映射字典中的选择进行映射
A['sample_B'] = A['segment'].map(lambda s: choice(d展开收缩))
结果
segment purchase price group sample_B
0 High 100 A 103
1 High 105 A 104
4 Low 10 A 9
英文:
Annotated code
from random import choice
# filter the A, B groups
A = df.query("group == 'A'")
B = df.query("group == 'B'")
# Create a mapping dictionary to list
# all purchase price for a given segment
d = B.groupby('segment')['purchase price'].agg(list)
# Map the segments in A with a choice from mapping dict
A['sample_B'] = A['segment'].map(lambda s: choice(d展开收缩))
Result
segment purchase price group sample_B
0 High 100 A 103
1 High 105 A 104
4 Low 10 A 9
答案2
得分: 0
步骤
代码:
# 准备样本数据
d = [["High", 100, "A"],
["High", 105, "A"],
["High", 103, "B"],
["High", 104, "B"],
["Low", 10, "A"],
["Low", 9, "B"],
["Low", 50, "B"],
["Low", 55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])
# 拆分为两部分
a=df.query("group =='A'")
b=df.query("group =='B'")
# 连接a和b
ab=a.join(b.set_index('segment'), on='segment', lsuffix='_a', rsuffix='_b')
# 在分组中抽样
ab.groupby(['segment', 'price_a']).sample(n=1)
结果:
segment price_a group_a price_b group_b 0 High 100 A 104 B 1 High 105 A 103 B 4 Low 10 A 9 B
英文:
steps
- split into two df
- self join
- sample in group
code:
# prepare sample data
d = [["High", 100, "A"]
,["High", 105, "A"]
,["High", 103, "B"]
,["High", 104, "B"]
,["Low", 10, "A"]
,["Low", 9, "B"]
,["Low", 50, "B"]
,["Low", 55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])
# split into two part
a=df.query("group =='A'")
b=df.query("group =='B'")
# a join b
ab=a.join(b.set_index('segment'), on = 'segment', lsuffix='_a', rsuffix='_b')
# sample in group by
ab.groupby(['segment', 'price_a']).sample(n=1)
result:
<pre>
segment price_a group_a price_b group_b
0 High 100 A 104 B
1 High 105 A 103 B
4 Low 10 A 9 B
</pre>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论