2023年2月14日 00:21:26go评论101阅读模式

英文:

R-style formulas when implementing a power (i.e. square) in a GLM misbehaves

问题

在下面的Python代码中，glm模型规范在model1中没有包括模型中的三次方，但在model2中包括了：

model1 = glm(formula="wage ~ workhours + workhours**3 + C(gender)", data=df, family=sm.families.Gaussian())
model2 = glm(formula="wage ~ workhours + np.power(workhours, 3) + C(gender)", data=df, family=sm.families.Gaussian())

这是一个错误吗？根据文档 x raises something to the power 3.

英文:

In the python code below, the glm model specification does not include the third power in the in model1 but it does in model2:

model1 = glm(formula=&quot;wage ~ workhours + workhours**3           + C(gender)&quot;, data=df, family=sm.families.Gaussian())
model2 = glm(formula=&quot;wage ~ workhours + np.power(workhours, 3) + C(gender)&quot;, data=df, family=sm.families.Gaussian())

Is this a bug? According to the documentation **x raises something to the power 3.

答案1

得分: 6

在公式中的 ** 被视为公式运算符，而不是普通的指数运算。（这类似于 R 公式中 ^ 的工作原理。）

(a+b+c+d)**3 表示模型应包括 a、b、c、d 以及这些变量之间的所有交互作用，最高到 3 次方。

workhours**3 表示模型应包括 workhours 以及所有与之相关的...仅仅是 workhours... 直到 3 次方... 但没有这种交互项，因此与只使用 workhours 等效。

相比之下，np.power(workhours, 3) 被视为 Python 代码，并计算您想要的幂次运算。

statsmodels 使用 patsy 来处理公式，因此要获取有关公式语言的详细信息，可以查看 patsy 文档。

英文:

** in a formula is treated as a formula operator, not as regular exponentiation. (This is similar to how ^ works in an R formula.)

(a+b+c+d)**3 means that the model should include a, b, c, d, and all interactions between these variables up to 3rd order.

workhours**3 means that the model should include workhours and all interactions between... just workhours... up to 3rd order... but there are no such interaction terms, so it's equivalent to just workhours.

In contrast, np.power(workhours, 3) is treated as Python code, and computes the power you wanted.

statsmodels uses patsy for formula handling, so for full details on the formula language, you can check the patsy docs.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R-style formulas在实现GLM中的幂（即平方）时出现问题

问题

答案1

获取属性的XPath

Pandas箱线图的第二个轴

解决使用Python和Sympy解决涉及三角函数的非线性方程组问题

MNIST从零开始的神经网络，输出收敛到一个数字。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。