R-style formulas在实现GLM中的幂(即平方)时出现问题

huangapple go评论68阅读模式
英文:

R-style formulas when implementing a power (i.e. square) in a GLM misbehaves

问题

在下面的Python代码中,glm模型规范在model1中没有包括模型中的三次方,但在model2中包括了:

model1 = glm(formula="wage ~ workhours + workhours**3 + C(gender)", data=df, family=sm.families.Gaussian())
model2 = glm(formula="wage ~ workhours + np.power(workhours, 3) + C(gender)", data=df, family=sm.families.Gaussian())

这是一个错误吗?根据文档 x raises something to the power 3.

英文:

In the python code below, the glm model specification does not include the third power in the in model1 but it does in model2:

model1 = glm(formula="wage ~ workhours + workhours**3           + C(gender)", data=df, family=sm.families.Gaussian())
model2 = glm(formula="wage ~ workhours + np.power(workhours, 3) + C(gender)", data=df, family=sm.families.Gaussian())

Is this a bug? According to the documentation **x raises something to the power 3.

答案1

得分: 6

在公式中的 ** 被视为公式运算符,而不是普通的指数运算。(这类似于 R 公式中 ^ 的工作原理。)

(a+b+c+d)**3 表示模型应包括 abcd 以及这些变量之间的所有交互作用,最高到 3 次方。

workhours**3 表示模型应包括 workhours 以及所有与之相关的...仅仅是 workhours... 直到 3 次方... 但没有这种交互项,因此与只使用 workhours 等效。

相比之下,np.power(workhours, 3) 被视为 Python 代码,并计算您想要的幂次运算。

statsmodels 使用 patsy 来处理公式,因此要获取有关公式语言的详细信息,可以查看 patsy 文档

英文:

** in a formula is treated as a formula operator, not as regular exponentiation. (This is similar to how ^ works in an R formula.)

(a+b+c+d)**3 means that the model should include a, b, c, d, and all interactions between these variables up to 3rd order.

workhours**3 means that the model should include workhours and all interactions between... just workhours... up to 3rd order... but there are no such interaction terms, so it's equivalent to just workhours.

In contrast, np.power(workhours, 3) is treated as Python code, and computes the power you wanted.

statsmodels uses patsy for formula handling, so for full details on the formula language, you can check the patsy docs.

huangapple
  • 本文由 发表于 2023年2月14日 00:21:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438567.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定