2023年8月11日 04:35:02go评论225阅读模式

英文:

Generate Pandas intercept product for interactions in multiple regression

问题

我正在尝试在两个数据框之间构建两个因子的交互作用，通常在多元线性回归中这样做以查找多层次的交互作用。我们正在尝试构建一个包含来自每个数据框的所有可能列乘积的置换表：

编辑：最初我说是交集，我想说的是交互作用

df1 = pd.DataFrame({"A1":[1,1,1,1],
                    "A2":[1,2,3,4],
                    "A3":[3,3,3,3]})
df1
   A1  A2  A3
0   1   1   3
1   1   2   3
2   1   3   3
3   1   4   3
df2 = pd.DataFrame({"B1":[1,2,3,4],
                    "B2":[10,10,10,10]})
df2
   B1  B2
0   1  10
1   2  10
2   3  10
3   4  10

我期望的df1 * df2的输出将是：

   A1*B1  A1*B2  A2*B1  A2*B2  A3*B1   A3*B2
0      1     10      1     10      3      30
1      2     10      4     20      6      30
2      3     10      9     30      9      30
3      4     10     16     40     12      30

我觉得我对点积有点不太清楚。但是，无论我如何尝试，当我尝试df1 @ df2时，都会得到"矩阵未对齐"的结果。

从df1.T @ df2得到了我不想要的结果：

df1.T @ df2
    B1   B2
A1  10   40
A2  30  100
A3  30  120

我尝试了一些列表推导的方法，但似乎无法得到期望的结果。

英文:

I'm trying to build an interaction of two factors from two dataframes as is commonly done in multiple linear regression to look for multi-level interactions. We're trying to build a permutation table of all possible column products from each dataframe:

edit: originally I said intersect, I meant interaction

df1 = pd.DataFrame({&quot;A1&quot;:[1,1,1,1],
                    &quot;A2&quot;:[1,2,3,4],
                    &quot;A3&quot;:[3,3,3,3]})
df1
   A1  A2  A3
0   1   1   3
1   1   2   3
2   1   3   3
3   1   4   3
df2 = pd.DataFrame({&quot;B1&quot;:[1,2,3,4],
                    &quot;B2&quot;:[10,10,10,10]})
df2
   B1  B2
0   1  10
1   2  10
2   3  10
3   4  10

My desired output of df1 `* df2 would be:

   A1*B1  A1*B2  A2*B1  A2*B2  A3*B1   A3*B2
0      1     10      1     10      3      30
1      2     10      4     20      6      30
2      3     10      9     30      9      30
3      4     10     16     40     12      30

I feel like I'm really missing something about dot products. But every which way I try them I just get "matrices are not aligned" when I try df1 @ df2

I get the outcome I don't want from df1.T @ df2

df1.T @ df2
    B1   B2
A1  10   40
A2  30  100
A3  30  120

I've tried a few list comprehension methods, but I can't seem to make the magic happen.

答案1

得分: 2

以下是您要翻译的代码部分：

out = pd.DataFrame(
    np.hstack(np.multiply(df1.values[None, :, :].T, df2.values)),
    columns=pd.MultiIndex.from_product([df1.columns, df2.columns]),
)
print(out)

打印输出：

  A1      A2      A3    
  B1  B2  B1  B2  B1  B2
0  1  10   1  10   3  30
1  2  10   4  20   6  30
2  3  10   9  30   9  30
3  4  10  16  40  12  30

英文:

IIUC, you can do:

out = pd.DataFrame(
    np.hstack(np.multiply(df1.values[None, :, :].T, df2.values)),
    columns=pd.MultiIndex.from_product([df1.columns, df2.columns]),
)
print(out)

Prints:

  A1      A2      A3    
  B1  B2  B1  B2  B1  B2
0  1  10   1  10   3  30
1  2  10   4  20   6  30
2  3  10   9  30   9  30
3  4  10  16  40  12  30

答案2

得分: 2

你可以使用 np.einsum，这是矩阵乘法的通用扩展：

# 指令 `ij,ik->ijk` 表示取 df1 的第 ij 个元素与 df2 的第 ik 个元素相乘，
# 然后放在 ijk 位置
# 接下来我们只需要将最后两个维度展开成一个
output = pd.DataFrame(
    np.einsum('ij,ik->ijk', df1, df2).reshape(df1.shape[0],-1),
    columns=pd.MultiIndex.from_product([df1.columns, df2.columns])
)
print(output)

输出：

      A1      A2      A3    
      B1  B2  B1  B2  B1  B2
    0  1  10   1  10   3  30
    1  2  10   4  20   6  30
    2  3  10   9  30   9  30
    3  4  10  16  40  12  30

英文:

You can use np.einsum which is a general extension of matrix product:

# the directive `&#39;ij,ik-&gt;ijk&#39;` means taking product of 
# the ij-th element in df1 and the ik-th element in df2
# then put that at ijk-th position
# we then just need to unfold the last 2 dimension into one
output = pd.DataFrame(
    np.einsum(&#39;ij,ik-&gt;ijk&#39;, df1, df2).reshape(df1.shape[0],-1),
    columns=pd.MultiIndex.from_product([df1.columns, df2.columns])
)
print(output)

Output:

  A1      A2      A3    
  B1  B2  B1  B2  B1  B2
0  1  10   1  10   3  30
1  2  10   4  20   6  30
2  3  10   9  30   9  30
3  4  10  16  40  12  30

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

生成多元回归中交互项的Pandas截距乘积。

问题

答案1

答案2

Python的json.dumps函数无法正确缩进。

通过Java运行Python脚本，只需导入一次

java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset /Spark – JAVA

如何将问题和GPT API响应变量添加到Gradio chatbot()中？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。