英文:
Generate Pandas intercept product for interactions in multiple regression
问题
我正在尝试在两个数据框之间构建两个因子的交互作用,通常在多元线性回归中这样做以查找多层次的交互作用。我们正在尝试构建一个包含来自每个数据框的所有可能列乘积的置换表:
编辑:最初我说是交集,我想说的是交互作用
df1 = pd.DataFrame({"A1":[1,1,1,1],
"A2":[1,2,3,4],
"A3":[3,3,3,3]})
df1
A1 A2 A3
0 1 1 3
1 1 2 3
2 1 3 3
3 1 4 3
df2 = pd.DataFrame({"B1":[1,2,3,4],
"B2":[10,10,10,10]})
df2
B1 B2
0 1 10
1 2 10
2 3 10
3 4 10
我期望的df1 * df2的输出将是:
A1*B1 A1*B2 A2*B1 A2*B2 A3*B1 A3*B2
0 1 10 1 10 3 30
1 2 10 4 20 6 30
2 3 10 9 30 9 30
3 4 10 16 40 12 30
我觉得我对点积有点不太清楚。但是,无论我如何尝试,当我尝试df1 @ df2
时,都会得到"矩阵未对齐"的结果。
从df1.T @ df2
得到了我不想要的结果:
df1.T @ df2
B1 B2
A1 10 40
A2 30 100
A3 30 120
我尝试了一些列表推导的方法,但似乎无法得到期望的结果。
英文:
I'm trying to build an interaction of two factors from two dataframes as is commonly done in multiple linear regression to look for multi-level interactions. We're trying to build a permutation table of all possible column products from each dataframe:
edit: originally I said intersect, I meant interaction
df1 = pd.DataFrame({"A1":[1,1,1,1],
"A2":[1,2,3,4],
"A3":[3,3,3,3]})
df1
A1 A2 A3
0 1 1 3
1 1 2 3
2 1 3 3
3 1 4 3
df2 = pd.DataFrame({"B1":[1,2,3,4],
"B2":[10,10,10,10]})
df2
B1 B2
0 1 10
1 2 10
2 3 10
3 4 10
My desired output of df1 `* df2 would be:
A1*B1 A1*B2 A2*B1 A2*B2 A3*B1 A3*B2
0 1 10 1 10 3 30
1 2 10 4 20 6 30
2 3 10 9 30 9 30
3 4 10 16 40 12 30
I feel like I'm really missing something about dot products. But every which way I try them I just get "matrices are not aligned" when I try df1 @ df2
I get the outcome I don't want from df1.T @ df2
df1.T @ df2
B1 B2
A1 10 40
A2 30 100
A3 30 120
I've tried a few list comprehension methods, but I can't seem to make the magic happen.
答案1
得分: 2
以下是您要翻译的代码部分:
out = pd.DataFrame(
np.hstack(np.multiply(df1.values[None, :, :].T, df2.values)),
columns=pd.MultiIndex.from_product([df1.columns, df2.columns]),
)
print(out)
打印输出:
A1 A2 A3
B1 B2 B1 B2 B1 B2
0 1 10 1 10 3 30
1 2 10 4 20 6 30
2 3 10 9 30 9 30
3 4 10 16 40 12 30
英文:
IIUC, you can do:
out = pd.DataFrame(
np.hstack(np.multiply(df1.values[None, :, :].T, df2.values)),
columns=pd.MultiIndex.from_product([df1.columns, df2.columns]),
)
print(out)
Prints:
A1 A2 A3
B1 B2 B1 B2 B1 B2
0 1 10 1 10 3 30
1 2 10 4 20 6 30
2 3 10 9 30 9 30
3 4 10 16 40 12 30
答案2
得分: 2
你可以使用 np.einsum
,这是矩阵乘法的通用扩展:
# 指令 `ij,ik->ijk` 表示取 df1 的第 ij 个元素与 df2 的第 ik 个元素相乘,
# 然后放在 ijk 位置
# 接下来我们只需要将最后两个维度展开成一个
output = pd.DataFrame(
np.einsum('ij,ik->ijk', df1, df2).reshape(df1.shape[0],-1),
columns=pd.MultiIndex.from_product([df1.columns, df2.columns])
)
print(output)
输出:
A1 A2 A3
B1 B2 B1 B2 B1 B2
0 1 10 1 10 3 30
1 2 10 4 20 6 30
2 3 10 9 30 9 30
3 4 10 16 40 12 30
英文:
You can use np.einsum
which is a general extension of matrix product:
# the directive `'ij,ik->ijk'` means taking product of
# the ij-th element in df1 and the ik-th element in df2
# then put that at ijk-th position
# we then just need to unfold the last 2 dimension into one
output = pd.DataFrame(
np.einsum('ij,ik->ijk', df1, df2).reshape(df1.shape[0],-1),
columns=pd.MultiIndex.from_product([df1.columns, df2.columns])
)
print(output)
Output:
A1 A2 A3
B1 B2 B1 B2 B1 B2
0 1 10 1 10 3 30
1 2 10 4 20 6 30
2 3 10 9 30 9 30
3 4 10 16 40 12 30
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论