Translating Stata to R yields different results.

huangapple go评论58阅读模式
英文:

Translating Stata to R yields different results

问题

I see your request, and here's the translated part:

Stata 代码如下:

g tau = year - temp2 if temp2 > temp3 & (bod<. | do<. | lnfcoli<.)

R 代码如下:

data <- data %>%
  mutate(tau = if_else((temp2 > temp3) & 
                        (is.na(bod) | is.na(do) | is.na(lnfcoli)), 
                      year - temp2,
                      NA_integer_))

Stata 输出结果:

1   Year |  temp2  |  temp3 | bod | do  | lnfcoli | tau |
2   1986 |  1995   |  1986  | 3.2 | 7.2 | 2.1.    |  -9 |

R 输出结果:

1   Year |  temp2  |  temp3 | bod | do  | lnfcoli | tau |
2   1986 |  1995   |  1986  | 3.2 | 7.2 | 2.1.    |  NA |

请注意,这里只提供翻译后的内容,不回答问题。

英文:

I am trying to translate a Stata code from a paper into R.

The Stata code looks like this:

g tau = year - temp2 if temp2 &gt; temp3 &amp; (bod&lt;. | do&lt;. | lnfcoli&lt;.)

My R translation looks like this:

data &lt;- data %&gt;%
  mutate(tau = if_else((temp2 &gt; temp3) &amp; 
                         (is.na(bod) | is.na(do) | is.na(lnfcoli)), 
                       year - temp2,
                       NA_integer_))

The problem is that when I run each code I get different results.

This is the result I get when I run the code in Stata:

1   Year |  temp2  |  temp3 | bod | do  | lnfcoli | tau |
2   1986 |  1995   |  1986  | 3.2 | 7.2 | 2.1.    |  -9 |

This is the result I get when I run the code in R:

1   Year |  temp2  |  temp3 | bod | do  | lnfcoli | tau |
2   1986 |  1995   |  1986  | 3.2 | 7.2 | 2.1.    |  NA |

Do you know what might be wrong with my R code or what should I modify to get the same output?

答案1

得分: 2

以下是翻译好的内容:

"bod"、"do" 和 "lnfcoli" 中的任何一个都不缺失("NA"),因此您的逻辑返回 "FALSE" 并返回 "NA_integer_"(在 "if_else" 中为 "false=")。 Stata 将 "." 或缺失的值视为正无穷大,因此该检查实际上是在查找缺失值。

因此,在 R/dplyr 中的等效操作可能是:

data %>%
    mutate(
        tau = if_else(
            (temp2 > temp3) & (!(is.na(bod) | is.na(do) | is.na(lnfcoli))),
            year - temp2,
            NA_integer_
        )
    )
    
#  year temp2 temp3 bod  do lnfcoli tau
#1 1986  1995  1986 3.2 7.2     2.1  -9
英文:

None of bod, do or lnfcoli are missing (NA), so your logic returns FALSE and returns NA_integer_ (false= in the if_else). Stata treats . or missing values as positive infinity, so that check is actually looking for not missing.

So the equivalent in R/dplyr is probably:

data %&gt;%
    mutate(
        tau = if_else(
        (temp2 &gt; temp3) &amp; (!(is.na(bod) | is.na(do) | is.na(lnfcoli))),
        year-temp2,
        NA_integer_
        )
    )

#  year temp2 temp3 bod  do lnfcoli tau
#1 1986  1995  1986 3.2 7.2     2.1  -9

huangapple
  • 本文由 发表于 2023年6月1日 12:38:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378708.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定