英文:
Using ifelse change the result of my calculus
问题
以下是您要翻译的内容:
我有一个与不同试验对应的数据集,其中测试了不同的治疗方法。一些试验包括对照组,而一些则没有。
以下是一个示例,其中仅在试验1中有一个对照组:
data = data.frame(trial = c(1, 1, 2, 2),
treatment = c("control", "b", "b", "c"),
value = c(97.2, 99.3, 85.51, 85.01))
当一个试验包括对照组时,我想要计算该试验的治疗方法与该试验的对照组之间的比率。我首先筛选了所有包含对照组的试验,并且这个方法运行良好:
data %>%
filter(trial == 1) %>%
group_by(trial) %>%
summarise(treatment = treatment,
r = value / value[which(treatment == "control")])
但是当我尝试在不筛选试验的情况下使用 ifelse()
函数时,我得到了意外的结果,比率总是等于1(或在没有对照组时为NA
):
data %>%
group_by(trial) %>%
summarise(treatment = treatment,
r = ifelse("control" %in% treatment, value / value[which(treatment == "control")], NA))
英文:
I have a dataset corresponding to different trials in which are tested different treatments. Some trials include a Control group, some do not.
Here is an example where there is a Control group only in the trial 1:
data = data.frame(trial = c(1, 1, 2, 2),
treatment = c("control", "b", "b", "c"),
value = c(97.2, 99.3, 85.51, 85.01))
When a trial include a Control group, I want to make the ratio between the treatments of this trial and the Control of this trial. I first filtered all the trial with a Control group and it worked well:
data %>%
filter(trial == 1) %>%
group_by(trial) %>%
summarise(treatment = treatment,
r = value / value[which(treatment == "control")])
But when I tried to do it without filtering the trials and using the ifelse()
function, I got unexpected result, the ratio was always equals to 1 (or NA
when no Control group).
data %>%
group_by(trial) %>%
summarise(treatment = treatment,
r = ifelse("control" %in% treatment, value / value[which(treatment == "control")], NA))
答案1
得分: 5
-
如果你想保持数据框的长度一致,需要使用
mutate
而不是summarise
。 -
"control" %in% treatment
输出的长度为 1。即使你将其更改为treatment %in% "control"
,由于条件本身是错误的,它仍然会给出不正确的结果。
以下是一个更短的解决方案。
library(dplyr)
data %>%
group_by(trial) %>%
mutate(r = value/value[match("control", treatment)])
# trial treatment value r
# <dbl> <chr> <dbl> <dbl>
#1 1 control 97.2 1
#2 1 b 99.3 1.02
#3 2 b 85.5 NA
#4 2 c 85.0 NA
我们在这里使用 match
有两个原因:
match
确保即使对于一个trial
存在两个 "control",你仍然会得到一个数字。即使这是极不可能的,这样做是明智的。- 你可以放弃使用
ifelse
,因为如果 "control" 不存在,match
默认会返回NA
。
根据新的语法,你可以使用 .by
-
data %>%
mutate(r = value/value[match("control", treatment)], .by = trial)
英文:
I think you need two corrections -
-
You need
mutate
instead ofsummarise
if you want to keep the length of dataframe consistent. -
"control" %in% treatment
gives output of length 1. Even if you change it totreatment %in% "control"
it will still give incorrect result since the condition is wrong itself.
Here is a shorter solution.
library(dplyr)
data %>%
group_by(trial) %>%
mutate(r = value/value[match("control", treatment)])
# trial treatment value r
# <dbl> <chr> <dbl> <dbl>
#1 1 control 97.2 1
#2 1 b 99.3 1.02
#3 2 b 85.5 NA
#4 2 c 85.0 NA
We are using match
because of two reasons here -
match
ensures that you will always get 1 number even if you have two "control" for atrial
. Even if it is highly unlikely it is good to be sure.- You can let go of
ifelse
becausematch
would by default returnNA
if "control" does not exist.
As per new syntax you can use .by
-
data %>%
mutate(r = value/value[match("control", treatment)], .by = trial)
答案2
得分: 2
@RonakShah已经解释了为什么您的代码失败,我也更喜欢他的match()
解决方案。另一种最小修改您的代码的选项是将ifelse()
替换为if...else...
library(dplyr)
data %>%
group_by(trial) %>%
mutate(r = if ("control" %in% treatment) value/value[treatment == "control"] else NA)
# # A tibble: 4 × 4
# # Groups: trial [2]
# trial treatment value r
# <dbl> <chr> <dbl> <dbl>
# 1 1 control 97.2 1
# 2 1 b 99.3 1.02
# 3 2 b 85.5 NA
# 4 2 c 85.0 NA
英文:
@RonakShah has explained why your code fails, and I also prefer his match()
solution. Another option that minimally modifies your code is to replace ifelse()
with if...else...
library(dplyr)
data %>%
group_by(trial) %>%
mutate(r = if("control" %in% treatment) value/value[treatment == "control"] else NA)
# # A tibble: 4 × 4
# # Groups: trial [2]
# trial treatment value r
# <dbl> <chr> <dbl> <dbl>
# 1 1 control 97.2 1
# 2 1 b 99.3 1.02
# 3 2 b 85.5 NA
# 4 2 c 85.0 NA
答案3
得分: 1
这是一个使用基本的R语言解决方案和by
函数的示例。
使用data的数据,按照trial字段分组,然后执行以下操作:
- 将X$value除以X$treatment字段中与"control"匹配的值
- 将结果存储回X$value
最后,将结果转换为数据框(array2DF())。
# 结果如下:
# trial treatment value
# 1 1 control 1.000000
# 1.1 1 b 1.021605
# 2 2 b NA
# 2.1 2 c NA
创建于2023年7月17日,使用reprex v2.0.2
英文:
Here is a base R solution with by
.
with(data, by(data[-1], trial, \(X) {
X$value <- X$value/X$value[match("control", X$treatment)]
X
})) |> array2DF()
#> trial treatment value
#> 1 1 control 1.000000
#> 1.1 1 b 1.021605
#> 2 2 b NA
#> 2.1 2 c NA
<sup>Created on 2023-07-17 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论