从不同的列中指定多个条件后的值。

huangapple go评论74阅读模式
英文:

Specify value after several conditions from different columns

问题

我正在努力寻找一种在数据框中选择特定行的正确方法,该方法基于来自不同列的多个条件,并尝试将给定值输入到该单元格中。我正在尝试找到一个基于dplyr的解决方案,但是经过几个小时的尝试后仍然找不到...

我创建了一个空列'C',我想选择给定的'A'(因子)和该给定A的'B'(数值)的最大值(而不是整个B列的最大值),然后我想在相应的行的'C'列中写入'something'。

这是一个模拟我正在尝试做的事情的虚拟数据框:

A <- factor(c("19/09/2022", "19/09/2022", "19/09/2022", "20/09/2022", "20/09/2022", "20/09/2022", "21/09/2022", "21/09/2022", "21/09/2022", "22/09/2022", "22/09/2022", "22/09/2022"))
B <- c(0.1781223, 3.3488114, 4.1476595, 5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
df <- data.frame(A, B)

我创建了一个空列,我想要写入其中。

df$C <- NA

我尝试了类似于以下代码的方法,使用了case_when,但是出现了关于B的类型的错误:

df %>%
  filter(A == "19/09/2022") %>%
  mutate(C = case_when(
    B == max(B, na.rm = TRUE) ~ "something",
    B ~ NA))

谢谢你的帮助!

英文:

I am struggling to find the proper way to select a specific row in a dataframe, based on several conditions from different columns and to try to input a given value to that cell. I am trying to find a dplyr based solution, but can’t find any after several hours of trying…

I have created an empty column ‘C’, I want to select for a given ‘A’ (factor) and for the maximum value of ‘B’ (numeric) for that given A (not the max for the entire col B), then I want to write ‘something’ in the corresponding ‘C’ column for that row.

Here is a dummy DF replicating what I am trying to do:

A &lt;- factor(c(&quot;19/09/2022&quot;, &quot;19/09/2022&quot;, &quot;19/09/2022&quot;, &quot;20/09/2022&quot;, &quot;20/09/2022&quot;, &quot;20/09/2022&quot;, &quot;21/09/2022&quot;, &quot;21/09/2022&quot;, &quot;21/09/2022&quot;, &quot;22/09/2022&quot;, &quot;22/09/2022&quot;, &quot;22/09/2022&quot;))

B &lt;- c(0.1781223, 3.3488114,  4.1476595,  5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)

df &lt;- data.frame(A, B)

I create an empty column, in which I want to write.

df$C &lt;- NA

I have tried something like this, using case_when, but there is an error on the type of B:


df %&gt;%  
  filter(A == &quot;19/09/2022&quot;) %&gt;% 
  mutate(C = case_when(
    B == max(B, na.rm = T) ~ &quot;something&quot;,
    B ~ NA))

Thank you for your help!

答案1

得分: 0

问题在于,在case_when中,LHS应该是一个逻辑向量,而在B ~ NA中,它是一个double(=B)。也许你想要的是case_when(B == max(B, na.rm = T) ~ "something", .default = NA)

此外,由于你想要根据A的条件给C赋值,你需要使用group_bymutate.by参数对数据进行分组。另外,由于你只检查一个条件,你可以使用if_else

library(dplyr)

df %>%
  mutate(
    C = if_else(
      A == "19/09/2022" & B == max(B, na.rm = T),
      "something",
      NA
    ),
    .by = A
  )
#>             A          B         C
#> 1  19/09/2022  0.1781223      <NA>
#> 2  19/09/2022  3.3488114      <NA>
#> 3  19/09/2022  4.1476595 something
#> 4  20/09/2022  5.8611553      <NA>
#> 5  20/09/2022 10.9773307      <NA>
#> 6  20/09/2022 16.9890155      <NA>
#> 7  21/09/2022 24.0428161      <NA>
#> 8  21/09/2022 35.1776457      <NA>
#> 9  21/09/2022 40.4551331      <NA>
#> 10 22/09/2022 49.5663783      <NA>
#> 11 22/09/2022 63.9132875      <NA>
#> 12 22/09/2022 64.6766946      <NA>

以上是给出的翻译结果,只包含代码部分的翻译。

英文:

The issue is that in case_when the LHS should be a logical vector whereas in B ~ NA it is a double (=B). Perhaps you want case_when(B == max(B, na.rm = T) ~ &quot;something&quot;, .default = NA).

Moreover, as you want to assign a value to C based on a condition by A, you have to group your data using a group_by or the .by argument of mutate. Also, as you are only checking one condition you could go for an if_else:

library(dplyr)

df %&gt;%
  mutate(
    C = if_else(
      A == &quot;19/09/2022&quot; &amp; B == max(B, na.rm = T),
      &quot;something&quot;,
      NA
    ),
    .by = A
  )
#&gt;             A          B         C
#&gt; 1  19/09/2022  0.1781223      &lt;NA&gt;
#&gt; 2  19/09/2022  3.3488114      &lt;NA&gt;
#&gt; 3  19/09/2022  4.1476595 something
#&gt; 4  20/09/2022  5.8611553      &lt;NA&gt;
#&gt; 5  20/09/2022 10.9773307      &lt;NA&gt;
#&gt; 6  20/09/2022 16.9890155      &lt;NA&gt;
#&gt; 7  21/09/2022 24.0428161      &lt;NA&gt;
#&gt; 8  21/09/2022 35.1776457      &lt;NA&gt;
#&gt; 9  21/09/2022 40.4551331      &lt;NA&gt;
#&gt; 10 22/09/2022 49.5663783      &lt;NA&gt;
#&gt; 11 22/09/2022 63.9132875      &lt;NA&gt;
#&gt; 12 22/09/2022 64.6766946      &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年8月8日 23:59:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861306.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定