从不同的列中指定多个条件后的值。

huangapple go评论105阅读模式
英文:

Specify value after several conditions from different columns

问题

我正在努力寻找一种在数据框中选择特定行的正确方法,该方法基于来自不同列的多个条件,并尝试将给定值输入到该单元格中。我正在尝试找到一个基于dplyr的解决方案,但是经过几个小时的尝试后仍然找不到...

我创建了一个空列'C',我想选择给定的'A'(因子)和该给定A的'B'(数值)的最大值(而不是整个B列的最大值),然后我想在相应的行的'C'列中写入'something'。

这是一个模拟我正在尝试做的事情的虚拟数据框:

  1. A <- factor(c("19/09/2022", "19/09/2022", "19/09/2022", "20/09/2022", "20/09/2022", "20/09/2022", "21/09/2022", "21/09/2022", "21/09/2022", "22/09/2022", "22/09/2022", "22/09/2022"))
  2. B <- c(0.1781223, 3.3488114, 4.1476595, 5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
  3. df <- data.frame(A, B)

我创建了一个空列,我想要写入其中。

df$C <- NA

我尝试了类似于以下代码的方法,使用了case_when,但是出现了关于B的类型的错误:

  1. df %>%
  2. filter(A == "19/09/2022") %>%
  3. mutate(C = case_when(
  4. B == max(B, na.rm = TRUE) ~ "something",
  5. B ~ NA))

谢谢你的帮助!

英文:

I am struggling to find the proper way to select a specific row in a dataframe, based on several conditions from different columns and to try to input a given value to that cell. I am trying to find a dplyr based solution, but can’t find any after several hours of trying…

I have created an empty column ‘C’, I want to select for a given ‘A’ (factor) and for the maximum value of ‘B’ (numeric) for that given A (not the max for the entire col B), then I want to write ‘something’ in the corresponding ‘C’ column for that row.

Here is a dummy DF replicating what I am trying to do:

  1. A &lt;- factor(c(&quot;19/09/2022&quot;, &quot;19/09/2022&quot;, &quot;19/09/2022&quot;, &quot;20/09/2022&quot;, &quot;20/09/2022&quot;, &quot;20/09/2022&quot;, &quot;21/09/2022&quot;, &quot;21/09/2022&quot;, &quot;21/09/2022&quot;, &quot;22/09/2022&quot;, &quot;22/09/2022&quot;, &quot;22/09/2022&quot;))
  2. B &lt;- c(0.1781223, 3.3488114, 4.1476595, 5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
  3. df &lt;- data.frame(A, B)

I create an empty column, in which I want to write.

df$C &lt;- NA

I have tried something like this, using case_when, but there is an error on the type of B:

  1. df %&gt;%
  2. filter(A == &quot;19/09/2022&quot;) %&gt;%
  3. mutate(C = case_when(
  4. B == max(B, na.rm = T) ~ &quot;something&quot;,
  5. B ~ NA))

Thank you for your help!

答案1

得分: 0

问题在于,在case_when中,LHS应该是一个逻辑向量,而在B ~ NA中,它是一个double(=B)。也许你想要的是case_when(B == max(B, na.rm = T) ~ "something", .default = NA)

此外,由于你想要根据A的条件给C赋值,你需要使用group_bymutate.by参数对数据进行分组。另外,由于你只检查一个条件,你可以使用if_else

  1. library(dplyr)
  2. df %>%
  3. mutate(
  4. C = if_else(
  5. A == "19/09/2022" & B == max(B, na.rm = T),
  6. "something",
  7. NA
  8. ),
  9. .by = A
  10. )
  11. #> A B C
  12. #> 1 19/09/2022 0.1781223 <NA>
  13. #> 2 19/09/2022 3.3488114 <NA>
  14. #> 3 19/09/2022 4.1476595 something
  15. #> 4 20/09/2022 5.8611553 <NA>
  16. #> 5 20/09/2022 10.9773307 <NA>
  17. #> 6 20/09/2022 16.9890155 <NA>
  18. #> 7 21/09/2022 24.0428161 <NA>
  19. #> 8 21/09/2022 35.1776457 <NA>
  20. #> 9 21/09/2022 40.4551331 <NA>
  21. #> 10 22/09/2022 49.5663783 <NA>
  22. #> 11 22/09/2022 63.9132875 <NA>
  23. #> 12 22/09/2022 64.6766946 <NA>

以上是给出的翻译结果,只包含代码部分的翻译。

英文:

The issue is that in case_when the LHS should be a logical vector whereas in B ~ NA it is a double (=B). Perhaps you want case_when(B == max(B, na.rm = T) ~ &quot;something&quot;, .default = NA).

Moreover, as you want to assign a value to C based on a condition by A, you have to group your data using a group_by or the .by argument of mutate. Also, as you are only checking one condition you could go for an if_else:

  1. library(dplyr)
  2. df %&gt;%
  3. mutate(
  4. C = if_else(
  5. A == &quot;19/09/2022&quot; &amp; B == max(B, na.rm = T),
  6. &quot;something&quot;,
  7. NA
  8. ),
  9. .by = A
  10. )
  11. #&gt; A B C
  12. #&gt; 1 19/09/2022 0.1781223 &lt;NA&gt;
  13. #&gt; 2 19/09/2022 3.3488114 &lt;NA&gt;
  14. #&gt; 3 19/09/2022 4.1476595 something
  15. #&gt; 4 20/09/2022 5.8611553 &lt;NA&gt;
  16. #&gt; 5 20/09/2022 10.9773307 &lt;NA&gt;
  17. #&gt; 6 20/09/2022 16.9890155 &lt;NA&gt;
  18. #&gt; 7 21/09/2022 24.0428161 &lt;NA&gt;
  19. #&gt; 8 21/09/2022 35.1776457 &lt;NA&gt;
  20. #&gt; 9 21/09/2022 40.4551331 &lt;NA&gt;
  21. #&gt; 10 22/09/2022 49.5663783 &lt;NA&gt;
  22. #&gt; 11 22/09/2022 63.9132875 &lt;NA&gt;
  23. #&gt; 12 22/09/2022 64.6766946 &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年8月8日 23:59:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76861306.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定