如何在给定的段落内随机抽取一个值?

huangapple go评论62阅读模式
英文:

How to randomly sample a value within a given segment?

问题

我想创建一个新列"sample_group_B",从与组A相同的段中随机抽取组B的购买价格数值。我该如何在pandas中实现这一目标?

我尝试过使用np.random(),但它返回了一堆NaN值。

英文:

I want to create a new column "sample_group_B" which randomly samples a purchase price value from group B within the same segment of group A. How do I do this in pandas?

segment | purchase price | group
High    | 100            | A
High    | 105            | A
High    | 103            | B
High    | 104            | B
Low     | 10             | A
Low     | 9              | B
Low     | 50             | B
Low     | 55             | B

I want to create a new column that randomly samples the purchase price of group B within the respective segment such as:

segment | purchase price | group | sample_group_B
High    | 100            | A     | sample a value from (103 or 104)
High    | 105            | A     | sample a value from (103 or 104)
Low     | 10             | A     | sample a value from (9 or 50 or 55)

I tried np.random() but it returned a bunch of Nans.

答案1

得分: 0

# 注释的代码

from random import choice

# 过滤出A、B组
A = df.query("group == 'A'")
B = df.query("group == 'B")

# 创建一个映射字典,列出给定分段的所有购买价格
d = B.groupby('segment')['purchase price'].agg(list)

# 将A中的分段与映射字典中的选择进行映射
A['sample_B'] = A['segment'].map(lambda s: choice(d
展开收缩
))

结果

      segment  purchase price group  sample_B
    0    High             100     A       103
    1    High             105     A       104
    4     Low              10     A         9
英文:

Annotated code

from random import choice

# filter the A, B groups
A = df.query("group == 'A'")
B = df.query("group == 'B'")

# Create a mapping dictionary to list 
# all purchase price for a given segment
d = B.groupby('segment')['purchase price'].agg(list)

# Map the segments in A with a choice from mapping dict
A['sample_B'] = A['segment'].map(lambda s: choice(d
展开收缩
))

Result

  segment  purchase price group  sample_B
0    High             100     A       103
1    High             105     A       104
4     Low              10     A         9

答案2

得分: 0

步骤

  1. 拆分为两个数据框
  2. 自连接
  3. 分组中的抽样

代码:

# 准备样本数据
d = [["High", 100, "A"],
["High", 105, "A"],
["High", 103, "B"],
["High", 104, "B"],
["Low", 10, "A"],
["Low", 9, "B"],
["Low", 50, "B"],
["Low", 55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])

# 拆分为两部分
a=df.query("group =='A'")
b=df.query("group =='B'")

# 连接a和b
ab=a.join(b.set_index('segment'), on='segment', lsuffix='_a', rsuffix='_b')

# 在分组中抽样
ab.groupby(['segment', 'price_a']).sample(n=1)

结果:

	segment	price_a	group_a	price_b	group_b
0	High	100	A	104	B
1	High	105	A	103	B
4	Low	    10	A	9	B
英文:

steps

  1. split into two df
  2. self join
  3. sample in group

code:

# prepare sample data
d = [["High", 100, "A"]
,["High", 105, "A"]
,["High", 103, "B"]
,["High", 104, "B"]
,["Low",  10, "A"]
,["Low",  9, "B"]
,["Low",  50, "B"]
,["Low",  55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])

# split into two part
a=df.query("group =='A'")
b=df.query("group =='B'")

# a join b
ab=a.join(b.set_index('segment'), on = 'segment', lsuffix='_a', rsuffix='_b')

# sample in group by
ab.groupby(['segment', 'price_a']).sample(n=1)

result:

<pre>
segment price_a group_a price_b group_b
0 High 100 A 104 B
1 High 105 A 103 B
4 Low 10 A 9 B
</pre>

huangapple
  • 本文由 发表于 2023年2月8日 13:44:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定