2023年7月17日 18:00:34go评论107阅读模式

英文:

Using ifelse change the result of my calculus

问题

以下是您要翻译的内容：

我有一个与不同试验对应的数据集，其中测试了不同的治疗方法。一些试验包括对照组，而一些则没有。

以下是一个示例，其中仅在试验1中有一个对照组：

data = data.frame(trial = c(1, 1, 2, 2),
                  treatment = c("control", "b", "b", "c"),
                  value = c(97.2, 99.3, 85.51, 85.01))

当一个试验包括对照组时，我想要计算该试验的治疗方法与该试验的对照组之间的比率。我首先筛选了所有包含对照组的试验，并且这个方法运行良好：

data %>%
    filter(trial == 1) %>%
    group_by(trial) %>%
    summarise(treatment = treatment, 
              r = value / value[which(treatment == "control")])

但是当我尝试在不筛选试验的情况下使用 ifelse() 函数时，我得到了意外的结果，比率总是等于1（或在没有对照组时为NA）：

data %>%
    group_by(trial) %>%
    summarise(treatment = treatment, 
              r = ifelse("control" %in% treatment, value / value[which(treatment == "control")], NA))

英文:

I have a dataset corresponding to different trials in which are tested different treatments. Some trials include a Control group, some do not.

Here is an example where there is a Control group only in the trial 1:

data = data.frame(trial = c(1, 1, 2, 2),
                  treatment = c(&quot;control&quot;, &quot;b&quot;, &quot;b&quot;, &quot;c&quot;),
                  value = c(97.2, 99.3, 85.51, 85.01))

When a trial include a Control group, I want to make the ratio between the treatments of this trial and the Control of this trial. I first filtered all the trial with a Control group and it worked well:

data %&gt;%
    filter(trial == 1) %&gt;%
    group_by(trial) %&gt;%
    summarise(treatment = treatment, 
              r = value / value[which(treatment == &quot;control&quot;)])

But when I tried to do it without filtering the trials and using the ifelse() function, I got unexpected result, the ratio was always equals to 1 (or NA when no Control group).

data %&gt;%
    group_by(trial) %&gt;%
    summarise(treatment = treatment, 
              r = ifelse(&quot;control&quot; %in% treatment, value / value[which(treatment == &quot;control&quot;)], NA))

答案1

得分: 5

如果你想保持数据框的长度一致，需要使用 mutate 而不是 summarise。
"control" %in% treatment 输出的长度为 1。即使你将其更改为 treatment %in% "control"，由于条件本身是错误的，它仍然会给出不正确的结果。

以下是一个更短的解决方案。

library(dplyr)
data %>%
  group_by(trial) %>%
  mutate(r = value/value[match(&quot;control&quot;, treatment)])
#  trial treatment value     r
#  &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt;
#1     1 control    97.2  1   
#2     1 b          99.3  1.02
#3     2 b          85.5 NA   
#4     2 c          85.0 NA

我们在这里使用 match 有两个原因：

match 确保即使对于一个 trial 存在两个 "control"，你仍然会得到一个数字。即使这是极不可能的，这样做是明智的。
你可以放弃使用 ifelse，因为如果 "control" 不存在，match 默认会返回 NA。

根据新的语法，你可以使用 .by -

data %>%
  mutate(r = value/value[match(&quot;control&quot;, treatment)], .by = trial)

英文:

I think you need two corrections -

You need mutate instead of summarise if you want to keep the length of dataframe consistent.
"control" %in% treatment gives output of length 1. Even if you change it to treatment %in% "control" it will still give incorrect result since the condition is wrong itself.

Here is a shorter solution.

library(dplyr)
data %&gt;%
  group_by(trial) %&gt;%
  mutate(r = value/value[match(&quot;control&quot;, treatment)])
#  trial treatment value     r
#  &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt;
#1     1 control    97.2  1   
#2     1 b          99.3  1.02
#3     2 b          85.5 NA   
#4     2 c          85.0 NA

We are using match because of two reasons here -

match ensures that you will always get 1 number even if you have two "control" for a trial. Even if it is highly unlikely it is good to be sure.
You can let go of ifelse because match would by default return NA if "control" does not exist.

As per new syntax you can use .by -

data %&gt;%
  mutate(r = value/value[match(&quot;control&quot;, treatment)], .by = trial)

答案2

得分: 2

@RonakShah已经解释了为什么您的代码失败，我也更喜欢他的match()解决方案。另一种最小修改您的代码的选项是将ifelse()替换为if...else...

library(dplyr)
data %>%
  group_by(trial) %>%
  mutate(r = if ("control" %in% treatment) value/value[treatment == "control"] else NA)
# # A tibble: 4 × 4
# # Groups:   trial [2]
#   trial treatment value     r
#   <dbl> <chr>     <dbl> <dbl>
# 1     1 control    97.2  1   
# 2     1 b          99.3  1.02
# 3     2 b          85.5 NA   
# 4     2 c          85.0 NA

英文:

@RonakShah has explained why your code fails, and I also prefer his match() solution. Another option that minimally modifies your code is to replace ifelse() with if...else...

library(dplyr)
data %&gt;%
  group_by(trial) %&gt;%
  mutate(r = if(&quot;control&quot; %in% treatment) value/value[treatment == &quot;control&quot;] else NA)
# # A tibble: 4 &#215; 4
# # Groups:   trial [2]
#   trial treatment value     r
#   &lt;dbl&gt; &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt;
# 1     1 control    97.2  1   
# 2     1 b          99.3  1.02
# 3     2 b          85.5 NA   
# 4     2 c          85.0 NA

答案3

得分: 1

这是一个使用基本的R语言解决方案和by函数的示例。

使用data的数据，按照trial字段分组，然后执行以下操作：
  - 将X$value除以X$treatment字段中与"control"匹配的值
  - 将结果存储回X$value
最后，将结果转换为数据框(array2DF())。
# 结果如下：
#    trial treatment    value
# 1       1   control 1.000000
# 1.1     1         b 1.021605
# 2       2         b       NA
# 2.1     2         c       NA

^{创建于2023年7月17日，使用reprex v2.0.2}

英文:

Here is a base R solution with by.

with(data, by(data[-1], trial, \(X) {
  X$value &lt;- X$value/X$value[match(&quot;control&quot;, X$treatment)]
  X
})) |&gt; array2DF()
#&gt;     trial treatment    value
#&gt; 1       1   control 1.000000
#&gt; 1.1     1         b 1.021605
#&gt; 2       2         b       NA
#&gt; 2.1     2         c       NA

<sup>Created on 2023-07-17 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用ifelse改变我的计算结果。

问题

答案1

答案2

答案3

设置huxtable中的行高

调用在dplyr管道中创建的变量 – R

基于空间链接距离（也称为邻居距离）计算距离矩阵。

Python / Pandas: 将行中的实体向右移动（末尾）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。