2023年8月8日 23:59:59go评论105阅读模式

英文:

Specify value after several conditions from different columns

问题

我正在努力寻找一种在数据框中选择特定行的正确方法，该方法基于来自不同列的多个条件，并尝试将给定值输入到该单元格中。我正在尝试找到一个基于dplyr的解决方案，但是经过几个小时的尝试后仍然找不到...

我创建了一个空列'C'，我想选择给定的'A'（因子）和该给定A的'B'（数值）的最大值（而不是整个B列的最大值），然后我想在相应的行的'C'列中写入'something'。

这是一个模拟我正在尝试做的事情的虚拟数据框：

A <- factor(c("19/09/2022", "19/09/2022", "19/09/2022", "20/09/2022", "20/09/2022", "20/09/2022", "21/09/2022", "21/09/2022", "21/09/2022", "22/09/2022", "22/09/2022", "22/09/2022"))
B <- c(0.1781223, 3.3488114, 4.1476595, 5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
df <- data.frame(A, B)

我创建了一个空列，我想要写入其中。

df$C <- NA

我尝试了类似于以下代码的方法，使用了case_when，但是出现了关于B的类型的错误：

df %>%
  filter(A == "19/09/2022") %>%
  mutate(C = case_when(
    B == max(B, na.rm = TRUE) ~ "something",
    B ~ NA))

谢谢你的帮助！

英文:

I am struggling to find the proper way to select a specific row in a dataframe, based on several conditions from different columns and to try to input a given value to that cell. I am trying to find a dplyr based solution, but can’t find any after several hours of trying…

I have created an empty column ‘C’, I want to select for a given ‘A’ (factor) and for the maximum value of ‘B’ (numeric) for that given A (not the max for the entire col B), then I want to write ‘something’ in the corresponding ‘C’ column for that row.

Here is a dummy DF replicating what I am trying to do:

A &lt;- factor(c(&quot;19/09/2022&quot;, &quot;19/09/2022&quot;, &quot;19/09/2022&quot;, &quot;20/09/2022&quot;, &quot;20/09/2022&quot;, &quot;20/09/2022&quot;, &quot;21/09/2022&quot;, &quot;21/09/2022&quot;, &quot;21/09/2022&quot;, &quot;22/09/2022&quot;, &quot;22/09/2022&quot;, &quot;22/09/2022&quot;))
B &lt;- c(0.1781223, 3.3488114,  4.1476595,  5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
df &lt;- data.frame(A, B)

I create an empty column, in which I want to write.

df$C <- NA

I have tried something like this, using case_when, but there is an error on the type of B:


df %&gt;%  
  filter(A == &quot;19/09/2022&quot;) %&gt;% 
  mutate(C = case_when(
    B == max(B, na.rm = T) ~ &quot;something&quot;,
    B ~ NA))

Thank you for your help!

答案1

得分: 0

问题在于，在case_when中，LHS应该是一个逻辑向量，而在B ~ NA中，它是一个double（=B）。也许你想要的是case_when(B == max(B, na.rm = T) ~ "something", .default = NA)。

此外，由于你想要根据A的条件给C赋值，你需要使用group_by或mutate的.by参数对数据进行分组。另外，由于你只检查一个条件，你可以使用if_else：

library(dplyr)
df %>%
  mutate(
    C = if_else(
      A == "19/09/2022" & B == max(B, na.rm = T),
      "something",
      NA
    ),
    .by = A
  )
#>             A          B         C
#> 1  19/09/2022  0.1781223      <NA>
#> 2  19/09/2022  3.3488114      <NA>
#> 3  19/09/2022  4.1476595 something
#> 4  20/09/2022  5.8611553      <NA>
#> 5  20/09/2022 10.9773307      <NA>
#> 6  20/09/2022 16.9890155      <NA>
#> 7  21/09/2022 24.0428161      <NA>
#> 8  21/09/2022 35.1776457      <NA>
#> 9  21/09/2022 40.4551331      <NA>
#> 10 22/09/2022 49.5663783      <NA>
#> 11 22/09/2022 63.9132875      <NA>
#> 12 22/09/2022 64.6766946      <NA>

以上是给出的翻译结果，只包含代码部分的翻译。

英文:

The issue is that in case_when the LHS should be a logical vector whereas in B ~ NA it is a double (=B). Perhaps you want case_when(B == max(B, na.rm = T) ~ "something", .default = NA).

Moreover, as you want to assign a value to C based on a condition by A, you have to group your data using a group_by or the .by argument of mutate. Also, as you are only checking one condition you could go for an if_else:

library(dplyr)
df %&gt;%
  mutate(
    C = if_else(
      A == &quot;19/09/2022&quot; &amp; B == max(B, na.rm = T),
      &quot;something&quot;,
      NA
    ),
    .by = A
  )
#&gt;             A          B         C
#&gt; 1  19/09/2022  0.1781223      &lt;NA&gt;
#&gt; 2  19/09/2022  3.3488114      &lt;NA&gt;
#&gt; 3  19/09/2022  4.1476595 something
#&gt; 4  20/09/2022  5.8611553      &lt;NA&gt;
#&gt; 5  20/09/2022 10.9773307      &lt;NA&gt;
#&gt; 6  20/09/2022 16.9890155      &lt;NA&gt;
#&gt; 7  21/09/2022 24.0428161      &lt;NA&gt;
#&gt; 8  21/09/2022 35.1776457      &lt;NA&gt;
#&gt; 9  21/09/2022 40.4551331      &lt;NA&gt;
#&gt; 10 22/09/2022 49.5663783      &lt;NA&gt;
#&gt; 11 22/09/2022 63.9132875      &lt;NA&gt;
#&gt; 12 22/09/2022 64.6766946      &lt;NA&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从不同的列中指定多个条件后的值。

问题

答案1

Struggling with rJava and dependent packages in Rstudio Mac M1. Error: package or namespace load failed for ‘rJava’:

使用FOR循环中的group_by来创建前导变量

将 “large” 转换为 “long”。

循环遍历数据框以“清理”数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。