使用ifelse改变我的计算结果。

huangapple go评论59阅读模式
英文:

Using ifelse change the result of my calculus

问题

以下是您要翻译的内容:

我有一个与不同试验对应的数据集,其中测试了不同的治疗方法。一些试验包括对照组,而一些则没有。

以下是一个示例,其中仅在试验1中有一个对照组:

data = data.frame(trial = c(1, 1, 2, 2),
                  treatment = c("control", "b", "b", "c"),
                  value = c(97.2, 99.3, 85.51, 85.01))

当一个试验包括对照组时,我想要计算该试验的治疗方法与该试验的对照组之间的比率。我首先筛选了所有包含对照组的试验,并且这个方法运行良好:

data %>%
    filter(trial == 1) %>%
    group_by(trial) %>%
    summarise(treatment = treatment, 
              r = value / value[which(treatment == "control")])

但是当我尝试在不筛选试验的情况下使用 ifelse() 函数时,我得到了意外的结果,比率总是等于1(或在没有对照组时为NA):

data %>%
    group_by(trial) %>%
    summarise(treatment = treatment, 
              r = ifelse("control" %in% treatment, value / value[which(treatment == "control")], NA))
英文:

I have a dataset corresponding to different trials in which are tested different treatments. Some trials include a Control group, some do not.

Here is an example where there is a Control group only in the trial 1:

data = data.frame(trial = c(1, 1, 2, 2),
                  treatment = c("control", "b", "b", "c"),
                  value = c(97.2, 99.3, 85.51, 85.01))

When a trial include a Control group, I want to make the ratio between the treatments of this trial and the Control of this trial. I first filtered all the trial with a Control group and it worked well:

data %>%
    filter(trial == 1) %>%
    group_by(trial) %>%
    summarise(treatment = treatment, 
              r = value / value[which(treatment == "control")])

But when I tried to do it without filtering the trials and using the ifelse() function, I got unexpected result, the ratio was always equals to 1 (or NA when no Control group).

data %>%
    group_by(trial) %>%
    summarise(treatment = treatment, 
              r = ifelse("control" %in% treatment, value / value[which(treatment == "control")], NA))

答案1

得分: 5

  1. 如果你想保持数据框的长度一致,需要使用 mutate 而不是 summarise

  2. "control" %in% treatment 输出的长度为 1。即使你将其更改为 treatment %in% "control",由于条件本身是错误的,它仍然会给出不正确的结果。

以下是一个更短的解决方案。

library(dplyr)

data %>%
  group_by(trial) %>%
  mutate(r = value/value[match("control", treatment)])

#  trial treatment value     r
#  <dbl> <chr>     <dbl> <dbl>
#1     1 control    97.2  1   
#2     1 b          99.3  1.02
#3     2 b          85.5 NA   
#4     2 c          85.0 NA   

我们在这里使用 match 有两个原因:

  1. match 确保即使对于一个 trial 存在两个 "control",你仍然会得到一个数字。即使这是极不可能的,这样做是明智的。
  2. 你可以放弃使用 ifelse,因为如果 "control" 不存在,match 默认会返回 NA

根据新的语法,你可以使用 .by -

data %>%
  mutate(r = value/value[match("control", treatment)], .by = trial)
英文:

I think you need two corrections -

  1. You need mutate instead of summarise if you want to keep the length of dataframe consistent.

  2. "control" %in% treatment gives output of length 1. Even if you change it to treatment %in% "control" it will still give incorrect result since the condition is wrong itself.

Here is a shorter solution.

library(dplyr)

data %>%
  group_by(trial) %>%
  mutate(r = value/value[match("control", treatment)])

#  trial treatment value     r
#  <dbl> <chr>     <dbl> <dbl>
#1     1 control    97.2  1   
#2     1 b          99.3  1.02
#3     2 b          85.5 NA   
#4     2 c          85.0 NA   

We are using match because of two reasons here -

  1. match ensures that you will always get 1 number even if you have two "control" for a trial. Even if it is highly unlikely it is good to be sure.
  2. You can let go of ifelse because match would by default return NA if "control" does not exist.

As per new syntax you can use .by -

data %>%
  mutate(r = value/value[match("control", treatment)], .by = trial)

答案2

得分: 2

@RonakShah已经解释了为什么您的代码失败,我也更喜欢他的match()解决方案。另一种最小修改您的代码的选项是将ifelse()替换为if...else...

library(dplyr)

data %>%
  group_by(trial) %>%
  mutate(r = if ("control" %in% treatment) value/value[treatment == "control"] else NA)

# # A tibble: 4 × 4
# # Groups:   trial [2]
#   trial treatment value     r
#   <dbl> <chr>     <dbl> <dbl>
# 1     1 control    97.2  1   
# 2     1 b          99.3  1.02
# 3     2 b          85.5 NA   
# 4     2 c          85.0 NA
英文:

@RonakShah has explained why your code fails, and I also prefer his match() solution. Another option that minimally modifies your code is to replace ifelse() with if...else...

library(dplyr)

data %&gt;%
  group_by(trial) %&gt;%
  mutate(r = if(&quot;control&quot; %in% treatment) value/value[treatment == &quot;control&quot;] else NA)

# # A tibble: 4 &#215; 4
# # Groups:   trial [2]
#   trial treatment value     r
#   &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt;
# 1     1 control    97.2  1   
# 2     1 b          99.3  1.02
# 3     2 b          85.5 NA   
# 4     2 c          85.0 NA

答案3

得分: 1

这是一个使用基本的R语言解决方案和by函数的示例。

使用data的数据,按照trial字段分组,然后执行以下操作:
  - 将X$value除以X$treatment字段中与"control"匹配的值
  - 将结果存储回X$value
最后,将结果转换为数据框(array2DF())。

# 结果如下:
#    trial treatment    value
# 1       1   control 1.000000
# 1.1     1         b 1.021605
# 2       2         b       NA
# 2.1     2         c       NA

创建于2023年7月17日,使用reprex v2.0.2

英文:

Here is a base R solution with by.

with(data, by(data[-1], trial, \(X) {
  X$value &lt;- X$value/X$value[match(&quot;control&quot;, X$treatment)]
  X
})) |&gt; array2DF()
#&gt;     trial treatment    value
#&gt; 1       1   control 1.000000
#&gt; 1.1     1         b 1.021605
#&gt; 2       2         b       NA
#&gt; 2.1     2         c       NA

<sup>Created on 2023-07-17 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月17日 18:00:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703346.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定