2023年2月8日 13:44:59go评论75阅读模式

英文:

How to randomly sample a value within a given segment?

问题

我想创建一个新列"sample_group_B"，从与组A相同的段中随机抽取组B的购买价格数值。我该如何在pandas中实现这一目标？

我尝试过使用np.random()，但它返回了一堆NaN值。

英文:

I want to create a new column "sample_group_B" which randomly samples a purchase price value from group B within the same segment of group A. How do I do this in pandas?

segment | purchase price | group
High    | 100            | A
High    | 105            | A
High    | 103            | B
High    | 104            | B
Low     | 10             | A
Low     | 9              | B
Low     | 50             | B
Low     | 55             | B

I want to create a new column that randomly samples the purchase price of group B within the respective segment such as:

segment | purchase price | group | sample_group_B
High    | 100            | A     | sample a value from (103 or 104)
High    | 105            | A     | sample a value from (103 or 104)
Low     | 10             | A     | sample a value from (9 or 50 or 55)

I tried np.random() but it returned a bunch of Nans.

答案1

得分: 0

# 注释的代码

from random import choice

# 过滤出A、B组
A = df.query("group == 'A'")
B = df.query("group == 'B")

# 创建一个映射字典，列出给定分段的所有购买价格
d = B.groupby('segment')['purchase price'].agg(list)

# 将A中的分段与映射字典中的选择进行映射
A['sample_B'] = A['segment'].map(lambda s: choice(d展开收缩))

结果

      segment  purchase price group  sample_B
    0    High             100     A       103
    1    High             105     A       104
    4     Low              10     A         9

英文:

Annotated code

from random import choice

# filter the A, B groups
A = df.query(&quot;group == &#39;A&#39;&quot;)
B = df.query(&quot;group == &#39;B&#39;&quot;)

# Create a mapping dictionary to list 
# all purchase price for a given segment
d = B.groupby(&#39;segment&#39;)[&#39;purchase price&#39;].agg(list)

# Map the segments in A with a choice from mapping dict
A[&#39;sample_B&#39;] = A[&#39;segment&#39;].map(lambda s: choice(d展开收缩))

Result

  segment  purchase price group  sample_B
0    High             100     A       103
1    High             105     A       104
4     Low              10     A         9

答案2

得分: 0

步骤

拆分为两个数据框
自连接
分组中的抽样

代码:

# 准备样本数据
d = [["High", 100, "A"],
["High", 105, "A"],
["High", 103, "B"],
["High", 104, "B"],
["Low", 10, "A"],
["Low", 9, "B"],
["Low", 50, "B"],
["Low", 55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])

# 拆分为两部分
a=df.query("group =='A'")
b=df.query("group =='B'")

# 连接a和b
ab=a.join(b.set_index('segment'), on='segment', lsuffix='_a', rsuffix='_b')

# 在分组中抽样
ab.groupby(['segment', 'price_a']).sample(n=1)

结果:

	segment	price_a	group_a	price_b	group_b
0	High	100	A	104	B
1	High	105	A	103	B
4	Low	    10	A	9	B

英文:

steps

split into two df
self join
sample in group

code:

# prepare sample data
d = [[&quot;High&quot;, 100, &quot;A&quot;]
,[&quot;High&quot;, 105, &quot;A&quot;]
,[&quot;High&quot;, 103, &quot;B&quot;]
,[&quot;High&quot;, 104, &quot;B&quot;]
,[&quot;Low&quot;,  10, &quot;A&quot;]
,[&quot;Low&quot;,  9, &quot;B&quot;]
,[&quot;Low&quot;,  50, &quot;B&quot;]
,[&quot;Low&quot;,  55, &quot;B&quot;]]
df = pd.DataFrame(d, columns =[&#39;segment&#39;, &#39;price&#39;, &#39;group&#39;])

# split into two part
a=df.query(&quot;group ==&#39;A&#39;&quot;)
b=df.query(&quot;group ==&#39;B&#39;&quot;)

# a join b
ab=a.join(b.set_index(&#39;segment&#39;), on = &#39;segment&#39;, lsuffix=&#39;_a&#39;, rsuffix=&#39;_b&#39;)

# sample in group by
ab.groupby([&#39;segment&#39;, &#39;price_a&#39;]).sample(n=1)

result:

<pre>
segment price_a group_a price_b group_b
0 High 100 A 104 B
1 High 105 A 103 B
4 Low 10 A 9 B
</pre>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在给定的段落内随机抽取一个值？

问题

答案1

结果

Annotated code

Result

答案2

步骤

代码:

结果:

steps

code:

result:

“RuntimeError: CustomJob resource has not been created” 在创建 Vertex AI CustomJob 时发生

触发并读取Jenkins控制台日志

如何融化数据框，使重复的项目成为与索引对应的值

如何在列表中将它们分开？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论