生成多元回归中交互项的Pandas截距乘积。

huangapple go评论225阅读模式
英文:

Generate Pandas intercept product for interactions in multiple regression

问题

我正在尝试在两个数据框之间构建两个因子的交互作用,通常在多元线性回归中这样做以查找多层次的交互作用。我们正在尝试构建一个包含来自每个数据框的所有可能列乘积的置换表:

编辑:最初我说是交集,我想说的是交互作用

  1. df1 = pd.DataFrame({"A1":[1,1,1,1],
  2. "A2":[1,2,3,4],
  3. "A3":[3,3,3,3]})
  4. df1
  5. A1 A2 A3
  6. 0 1 1 3
  7. 1 1 2 3
  8. 2 1 3 3
  9. 3 1 4 3
  10. df2 = pd.DataFrame({"B1":[1,2,3,4],
  11. "B2":[10,10,10,10]})
  12. df2
  13. B1 B2
  14. 0 1 10
  15. 1 2 10
  16. 2 3 10
  17. 3 4 10

我期望的df1 * df2的输出将是:

  1. A1*B1 A1*B2 A2*B1 A2*B2 A3*B1 A3*B2
  2. 0 1 10 1 10 3 30
  3. 1 2 10 4 20 6 30
  4. 2 3 10 9 30 9 30
  5. 3 4 10 16 40 12 30

我觉得我对点积有点不太清楚。但是,无论我如何尝试,当我尝试df1 @ df2时,都会得到"矩阵未对齐"的结果。

df1.T @ df2得到了我不想要的结果:

  1. df1.T @ df2
  2. B1 B2
  3. A1 10 40
  4. A2 30 100
  5. A3 30 120

我尝试了一些列表推导的方法,但似乎无法得到期望的结果。

英文:

I'm trying to build an interaction of two factors from two dataframes as is commonly done in multiple linear regression to look for multi-level interactions. We're trying to build a permutation table of all possible column products from each dataframe:

edit: originally I said intersect, I meant interaction

  1. df1 = pd.DataFrame({"A1":[1,1,1,1],
  2. "A2":[1,2,3,4],
  3. "A3":[3,3,3,3]})
  4. df1
  5. A1 A2 A3
  6. 0 1 1 3
  7. 1 1 2 3
  8. 2 1 3 3
  9. 3 1 4 3
  10. df2 = pd.DataFrame({"B1":[1,2,3,4],
  11. "B2":[10,10,10,10]})
  12. df2
  13. B1 B2
  14. 0 1 10
  15. 1 2 10
  16. 2 3 10
  17. 3 4 10

My desired output of df1 `* df2 would be:

  1. A1*B1 A1*B2 A2*B1 A2*B2 A3*B1 A3*B2
  2. 0 1 10 1 10 3 30
  3. 1 2 10 4 20 6 30
  4. 2 3 10 9 30 9 30
  5. 3 4 10 16 40 12 30

I feel like I'm really missing something about dot products. But every which way I try them I just get "matrices are not aligned" when I try df1 @ df2

I get the outcome I don't want from df1.T @ df2

  1. df1.T @ df2
  2. B1 B2
  3. A1 10 40
  4. A2 30 100
  5. A3 30 120

I've tried a few list comprehension methods, but I can't seem to make the magic happen.

答案1

得分: 2

以下是您要翻译的代码部分:

  1. out = pd.DataFrame(
  2. np.hstack(np.multiply(df1.values[None, :, :].T, df2.values)),
  3. columns=pd.MultiIndex.from_product([df1.columns, df2.columns]),
  4. )
  5. print(out)

打印输出:

  1. A1 A2 A3
  2. B1 B2 B1 B2 B1 B2
  3. 0 1 10 1 10 3 30
  4. 1 2 10 4 20 6 30
  5. 2 3 10 9 30 9 30
  6. 3 4 10 16 40 12 30
英文:

IIUC, you can do:

  1. out = pd.DataFrame(
  2. np.hstack(np.multiply(df1.values[None, :, :].T, df2.values)),
  3. columns=pd.MultiIndex.from_product([df1.columns, df2.columns]),
  4. )
  5. print(out)

Prints:

  1. A1 A2 A3
  2. B1 B2 B1 B2 B1 B2
  3. 0 1 10 1 10 3 30
  4. 1 2 10 4 20 6 30
  5. 2 3 10 9 30 9 30
  6. 3 4 10 16 40 12 30

答案2

得分: 2

你可以使用 np.einsum,这是矩阵乘法的通用扩展:

  1. # 指令 `ij,ik->ijk` 表示取 df1 的第 ij 个元素与 df2 的第 ik 个元素相乘,
  2. # 然后放在 ijk 位置
  3. # 接下来我们只需要将最后两个维度展开成一个
  4. output = pd.DataFrame(
  5. np.einsum('ij,ik->ijk', df1, df2).reshape(df1.shape[0],-1),
  6. columns=pd.MultiIndex.from_product([df1.columns, df2.columns])
  7. )
  8. print(output)

输出:

  1. A1 A2 A3
  2. B1 B2 B1 B2 B1 B2
  3. 0 1 10 1 10 3 30
  4. 1 2 10 4 20 6 30
  5. 2 3 10 9 30 9 30
  6. 3 4 10 16 40 12 30
英文:

You can use np.einsum which is a general extension of matrix product:

  1. # the directive `'ij,ik->ijk'` means taking product of
  2. # the ij-th element in df1 and the ik-th element in df2
  3. # then put that at ijk-th position
  4. # we then just need to unfold the last 2 dimension into one
  5. output = pd.DataFrame(
  6. np.einsum('ij,ik->ijk', df1, df2).reshape(df1.shape[0],-1),
  7. columns=pd.MultiIndex.from_product([df1.columns, df2.columns])
  8. )
  9. print(output)

Output:

  1. A1 A2 A3
  2. B1 B2 B1 B2 B1 B2
  3. 0 1 10 1 10 3 30
  4. 1 2 10 4 20 6 30
  5. 2 3 10 9 30 9 30
  6. 3 4 10 16 40 12 30

huangapple
  • 本文由 发表于 2023年8月11日 04:35:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76879156.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定