问题

I understand your request to translate the code part. Here's the translated code:

最近，我遇到一个非常奇怪的问题，即使使用Chatgpt4，我也无法解决它。最初，我想要创建一个循环来执行重复的回归工作，因为我有不同的自变量和因变量的组合。这是我的编码和演示数据集：

###创建一个包含重复测量的数据集，d变量代表ID，Cal是因变量，a、b、c是自变量
n_participants <- 50
n_measurements <- 3
a <- rnorm(n_participants * n_measurements, mean = 10, sd = 2)
b <- rnorm(n_participants * n_measurements, mean = 5, sd = 1)
c <- rnorm(n_participants * n_measurements, mean = 20, sd = 3)
d <- rep(1:n_participants, each = n_measurements)
Cal <- rbinom(n_participants * n_measurements, size = 1, prob = 0.5)

###向数据集添加NA值
missing_prop <- 0.2
missing_index <- sample(length(a), size = ceiling(length(a) * missing_prop))
a[missing_index] <- NA
b[missing_index] <- NA
c[missing_index] <- NA
Cal[missing_index] <- NA
data <- data.frame(Cal, a, b, c, d)

###使用mice进行缺失值插补
imputed_data <- mice(data, m = 5, maxit = 50, seed = 123)
###检查数据是否完全被插补
complete_data <- complete(imputed_data)
summary(complete_data)

###使用插补后的数据进行回归，因此需要使用with()
###解决方案1
rg1 <- with(imputed_data, geeglm(Cal ~ a, family = binomial, id = d, corstr = "independence"))
summary(rg1)
###解决方案2
var <- "a"
formula <- as.formula(paste0("Cal ~ ", var))
rg2 <- with(imputed_data, geeglm(formula, family = binomial, id = d, corstr = "independence"))
summary(rg2)

在解决方案2中，我首先使用as.formula指定公式。我仔细检查了公式，确保它与"Cal ~ a"相同，换句话说，与解决方案1中直接在回归模型中键入的公式相同。但是，解决方案2中的系数与解决方案1不同。
当我使用as.formula而不是直接在回归模型中键入公式时会发生什么？

也许我包装回归到循环中的方式不恰当。有没有专家可以分享一些将回归包装到循环中的经验？非常感谢！


<details>
<summary>英文:</summary>

Recently, I came across a very wired problem and I cannot solve it even by using Chatgpt4. Initially, I want to make a loop to do repeated work of regression because i have different combinations of independent variables and dependent variables. Here is my coding and demo dataset:

###create a dataset with repeated measurement, var d stands for ID, Cal is dependent variable, a,b,c are independent variables
n_participants <- 50
n_measurements <- 3
a <- rnorm(n_participantsn_measurements, mean = 10, sd = 2)
b <- rnorm(n_participantsn_measurements, mean = 5, sd = 1)
c <- rnorm(n_participantsn_measurements, mean = 20, sd = 3)
d <- rep(1:n_participants, each = n_measurements)
Cal <- rbinom(n_participantsn_measurements, size = 1, prob = 0.5)

###add NA to dataset
missing_prop <- 0.2
missing_index <- sample(length(a), size = ceiling(length(a)*missing_prop))
a[missing_index] <- NA
b[missing_index] <- NA
c[missing_index] <- NA
Cal[missing_index] <- NA
data <- data.frame(Cal, a, b, c, d)

###mice imputed
imputed_data <- mice(data, m = 5, maxit = 50, seed = 123)
###check whether the data were imputed completely
complete_data <- complete(imputed_data)
summary(complete_data)

###regression by imputed data, thus with() is need
###solution 1
rg1 <- with(imputed_data, geeglm(Cal ~ a, family = binomial, id = d, corstr = "independence"))
summary(rg1)
###solution 2
var <- "a"
formula <- as.formula(paste0("Cal ~ ", var))
rg2 <- with(imputed_data, geeglm(formula, family = binomial, id = d, corstr = "independence"))
summary(rg2)


As shown in solution 2, I first specify the formula by as.formula. I double checked the formula is the same as &quot;Cal ~ a&quot;, in other word the same as in solution 1 directly type the formula in the regression model. But the coefficients from solution2 is different from solution1.
What is happening when I use as.formula rather than directly type the formula into the regression model.



Maybe the way I wrap the regression to a loop is not appropriate. Could any expert share some experience with wrapping a regression into a loop? Many thanks in advance!

</details>


# 答案1
**得分**: 0

以下是翻译好的部分：

- "I think that's the place where you call `as.formula()`." 我认为这是你调用 `as.formula()` 的地方。
- "It seems to record the environment in which it is called, and with your solution 2 I get the results on the original dataset `data`." 似乎记录了调用它的环境，并且通过你的解决方案2，我获得了在原始数据集 `data` 上的结果。
- "solution 1" 解决方案1
- "solution 2" 解决方案2
- "Produces the same" 产生相同的结果
- "Solution 3 (same results as solution 1)" 解决方案3（与解决方案1相同的结果）
- "My take is that when you call `as.formula` in global environment it 'remembers' that it is called here and searches for a data.frame in that environment with variables Cal and a." 我的看法是，当你在全局环境中调用 `as.formula` 时，它“记住”它是在这里调用的，并在该环境中搜索具有变量 Cal 和 a 的数据框。

希望这对你有帮助。

<details>
<summary>英文:</summary>

I think that&#39;s the place where you call `as.formula()`. It seems to record the environnment in which it is called, and with your solution 2 I get the results on the original dataset `data`.

I&#39;m not that experienced with these and don&#39;t know how to specify a proper environnment, but you can call `as.formula()` inside the `with()` function and it seems to work with me.

        ###solution 1
        rg1 &lt;- with(imputed_data, geeglm(Cal ~ a, family = binomial, id = d, corstr = &quot;independence&quot;))
        summary(rg1)
        # A tibble: 10 x 6
           term        estimate std.error statistic p.value  nobs
           &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;
         1 (Intercept) -0.340      0.880   0.150      0.699   150
         2 a            0.0629     0.0862  0.533      0.465   150
         3 (Intercept) -0.253      0.802   0.0995     0.752   150
         4 a            0.0568     0.0779  0.531      0.466   150
         5 (Intercept)  0.512      0.834   0.376      0.540   150
         6 a           -0.0423     0.0810  0.272      0.602   150
         7 (Intercept)  0.0251     0.796   0.000992   0.975   150
         8 a            0.0265     0.0783  0.115      0.735   150
         9 (Intercept)  0.297      0.912   0.106      0.745   150
        10 a           -0.00538    0.0873  0.00380    0.951   150
    
        
    
        ###solution 2
        var &lt;- &quot;a&quot;
        formula &lt;- as.formula(paste0(&quot;Cal ~ &quot;, var))
        rg2 &lt;- with(imputed_data, geeglm(formula, family = binomial, id = d, corstr = &quot;independence&quot;))
        summary(rg2)
        # A tibble: 10 x 6
           term        estimate std.error statistic p.value  nobs
           &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;
         1 (Intercept)   0.0688    0.879    0.00614   0.938   120
         2 a             0.0230    0.0859   0.0715    0.789   120
         3 (Intercept)   0.0688    0.879    0.00614   0.938   120
         4 a             0.0230    0.0859   0.0715    0.789   120
         5 (Intercept)   0.0688    0.879    0.00614   0.938   120
         6 a             0.0230    0.0859   0.0715    0.789   120
         7 (Intercept)   0.0688    0.879    0.00614   0.938   120
         8 a             0.0230    0.0859   0.0715    0.789   120
         9 (Intercept)   0.0688    0.879    0.00614   0.938   120
        10 a             0.0230    0.0859   0.0715    0.789   120
         
        # Produces the same
        summary(geeglm(Cal ~ a, data = data, family = binomial, id = d, corstr = &quot;independence&quot;))
                
                Call:
                geeglm(formula = Cal ~ a, family = binomial, data = data, id = d, 
                    corstr = &quot;independence&quot;)
                
                 Coefficients:
                            Estimate Std.err  Wald Pr(&gt;|W|)
                (Intercept)  0.06884 0.87884 0.006    0.938
                a            0.02296 0.08586 0.071    0.789
                
                Correlation structure = independence 
                Estimated Scale Parameters:
                
                            Estimate Std.err
                (Intercept)        1 0.03166
                Number of clusters:   50  Maximum cluster size: 3
        
        # Solution 3 (same results as solution 1)
        formula &lt;- paste0(&quot;Cal ~ &quot;, var)
        rg3 &lt;- with(imputed_data, geeglm(as.formula(formula), family = binomial, id = d, corstr = &quot;independence&quot;))
        summary(rg3)
            # A tibble: 10 x 6
               term        estimate std.error statistic p.value  nobs
           &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;
         1 (Intercept) -0.340      0.880   0.150      0.699   150
         2 a            0.0629     0.0862  0.533      0.465   150
         3 (Intercept) -0.253      0.802   0.0995     0.752   150
         4 a            0.0568     0.0779  0.531      0.466   150
         5 (Intercept)  0.512      0.834   0.376      0.540   150
         6 a           -0.0423     0.0810  0.272      0.602   150
         7 (Intercept)  0.0251     0.796   0.000992   0.975   150
         8 a            0.0265     0.0783  0.115      0.735   150
         9 (Intercept)  0.297      0.912   0.106      0.745   150
        10 a           -0.00538    0.0873  0.00380    0.951   150
        
My take is that when you call `as.formula` in global environnment it &quot;remembers&quot; that it is called here and searches for a data.frame in that environnment with variables Cal and a. Moving it in with, the environnment isn&#39;t the global one anymore and it uses the data.frame in the mice object.
I hope this is not too clumpsy and that it helped. Maybe someone with further knowledge will explain that quirk so that we both understand it!


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

不同的方式来指定回归中相同的公式会得到不同的结果。

问题

ggplot和boxplot：是否可以添加权重？

在R包工作流中，如何将图像添加到Shiny应用程序中？

如何正确使用tidyverse包中的map()函数在添加矩阵计算层时？

用1逐行填充矩阵

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论