英文:
random pick item or value based on groupby date
问题
代码部分不需要翻译。以下是翻译的内容:
"如何在 var1 中随机选择 3 个变量,按 'DATE' 分组,求和,然后进行多次模拟?
df =
              DATE       var1	
    2023-01-31	1
    2023-01-31	2
    2023-01-31	3
    2023-01-31	4
    2023-01-31	5
    2023-02-28	6
    2023-02-28	7
    2023-02-28	8
    2023-02-28	9
    2023-02-28	10
    模拟 1 =
    2023-01-31 = (1+3+5) = 9
    2023-02-28 = (6+7+10) = 23
    模拟 2
    2023-01-31 = (1+2+5) = 8
    2023-02-28 = (9+7+10) = 26
    模拟 n.......
假设我们进行 10 次模拟作为示例。"
英文:
How to randomly pick 3 variables in var1, groupby 'DATE', sum it, then do several simulation?
df =
             
    DATE       var1	
    2023-01-31	1
    2023-01-31	2
    2023-01-31	3
    2023-01-31	4
    2023-01-31	5
    2023-02-28	6
    2023-02-28	7
    2023-02-28	8
    2023-02-28	9
    2023-02-28	10
    Simulation 1 =
    2023-01-31 = (1+3+5) = 9
    2023-02-28 = (6+7+10) = 23
    simulation 2
    2023-01-31 = (1+2+5) = 8
    2023-02-28 = (9+7+10) = 26
    
    simulation n.......
let's say we do 10 simulation for instance
答案1
得分: 2
你可以使用 groupby.agg 与 sample 进行操作:
out = df.groupby('DATE').agg(lambda g: g.sample(n=3).sum())
示例输出:
            var1
DATE            
2023-01-31     8
2023-02-28    27
如果你想重复这个过程,可以使用循环:
N = 10
for i in range(N):
    print(f'simulation {i+1}')
    print(df.groupby('DATE').agg(lambda g: g.sample(n=3).sum()))
从重复采样创建一个 DataFrame:
N = 10
query = 'DATE == "2023-01-31"'
out = pd.concat({i+1: df.query(query).groupby('DATE').agg(lambda g: g.sample(n=3).sum())
                 for i in range(N)
                 }, names=['simulation'])
示例输出:
                       var1
simulation DATE            
1          2023-01-31     8
2          2023-01-31    10
3          2023-01-31    12
4          2023-01-31     8
5          2023-01-31     9
6          2023-01-31    10
7          2023-01-31    11
8          2023-01-31    12
9          2023-01-31    10
10         2023-01-31     6
英文:
You can use groupby.agg with sample:
out = df.groupby('DATE').agg(lambda g: g.sample(n=3).sum())
Example output:
            var1
DATE            
2023-01-31     8
2023-02-28    27
If you want to repeat the process, use a loop:
N = 10
for i in range(N):
    print(f'simulation {i+1}')
    print(df.groupby('DATE').agg(lambda g: g.sample(n=3).sum()))
create a DataFrame from the repeated sampling:
N = 10
query = 'DATE == "2023-01-31"'
out = pd.concat({i+1: df.query(query).groupby('DATE').agg(lambda g: g.sample(n=3).sum())
                 for i in range(N)
                 }, names=['simulation'])
Example output:
                       var1
simulation DATE            
1          2023-01-31     8
2          2023-01-31    10
3          2023-01-31    12
4          2023-01-31     8
5          2023-01-31     9
6          2023-01-31    10
7          2023-01-31    11
8          2023-01-31    12
9          2023-01-31    10
10         2023-01-31     6
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论