英文:
Specify value after several conditions from different columns
问题
我正在努力寻找一种在数据框中选择特定行的正确方法,该方法基于来自不同列的多个条件,并尝试将给定值输入到该单元格中。我正在尝试找到一个基于dplyr的解决方案,但是经过几个小时的尝试后仍然找不到...
我创建了一个空列'C',我想选择给定的'A'(因子)和该给定A的'B'(数值)的最大值(而不是整个B列的最大值),然后我想在相应的行的'C'列中写入'something'。
这是一个模拟我正在尝试做的事情的虚拟数据框:
A <- factor(c("19/09/2022", "19/09/2022", "19/09/2022", "20/09/2022", "20/09/2022", "20/09/2022", "21/09/2022", "21/09/2022", "21/09/2022", "22/09/2022", "22/09/2022", "22/09/2022"))
B <- c(0.1781223, 3.3488114, 4.1476595, 5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
df <- data.frame(A, B)
我创建了一个空列,我想要写入其中。
df$C <- NA
我尝试了类似于以下代码的方法,使用了case_when,但是出现了关于B的类型的错误:
df %>%
filter(A == "19/09/2022") %>%
mutate(C = case_when(
B == max(B, na.rm = TRUE) ~ "something",
B ~ NA))
谢谢你的帮助!
英文:
I am struggling to find the proper way to select a specific row in a dataframe, based on several conditions from different columns and to try to input a given value to that cell. I am trying to find a dplyr based solution, but can’t find any after several hours of trying…
I have created an empty column ‘C’, I want to select for a given ‘A’ (factor) and for the maximum value of ‘B’ (numeric) for that given A (not the max for the entire col B), then I want to write ‘something’ in the corresponding ‘C’ column for that row.
Here is a dummy DF replicating what I am trying to do:
A <- factor(c("19/09/2022", "19/09/2022", "19/09/2022", "20/09/2022", "20/09/2022", "20/09/2022", "21/09/2022", "21/09/2022", "21/09/2022", "22/09/2022", "22/09/2022", "22/09/2022"))
B <- c(0.1781223, 3.3488114, 4.1476595, 5.8611553, 10.9773307, 16.9890155, 24.0428161, 35.1776457, 40.4551331, 49.5663783, 63.9132875, 64.6766946)
df <- data.frame(A, B)
I create an empty column, in which I want to write.
df$C <- NA
I have tried something like this, using case_when, but there is an error on the type of B:
df %>%
filter(A == "19/09/2022") %>%
mutate(C = case_when(
B == max(B, na.rm = T) ~ "something",
B ~ NA))
Thank you for your help!
答案1
得分: 0
问题在于,在case_when
中,LHS应该是一个逻辑向量,而在B ~ NA
中,它是一个double
(=B
)。也许你想要的是case_when(B == max(B, na.rm = T) ~ "something", .default = NA)
。
此外,由于你想要根据A
的条件给C
赋值,你需要使用group_by
或mutate
的.by
参数对数据进行分组。另外,由于你只检查一个条件,你可以使用if_else
:
library(dplyr)
df %>%
mutate(
C = if_else(
A == "19/09/2022" & B == max(B, na.rm = T),
"something",
NA
),
.by = A
)
#> A B C
#> 1 19/09/2022 0.1781223 <NA>
#> 2 19/09/2022 3.3488114 <NA>
#> 3 19/09/2022 4.1476595 something
#> 4 20/09/2022 5.8611553 <NA>
#> 5 20/09/2022 10.9773307 <NA>
#> 6 20/09/2022 16.9890155 <NA>
#> 7 21/09/2022 24.0428161 <NA>
#> 8 21/09/2022 35.1776457 <NA>
#> 9 21/09/2022 40.4551331 <NA>
#> 10 22/09/2022 49.5663783 <NA>
#> 11 22/09/2022 63.9132875 <NA>
#> 12 22/09/2022 64.6766946 <NA>
以上是给出的翻译结果,只包含代码部分的翻译。
英文:
The issue is that in case_when
the LHS should be a logical vector whereas in B ~ NA
it is a double
(=B
). Perhaps you want case_when(B == max(B, na.rm = T) ~ "something", .default = NA)
.
Moreover, as you want to assign a value to C
based on a condition by A
, you have to group your data using a group_by
or the .by
argument of mutate. Also, as you are only checking one condition you could go for an if_else
:
library(dplyr)
df %>%
mutate(
C = if_else(
A == "19/09/2022" & B == max(B, na.rm = T),
"something",
NA
),
.by = A
)
#> A B C
#> 1 19/09/2022 0.1781223 <NA>
#> 2 19/09/2022 3.3488114 <NA>
#> 3 19/09/2022 4.1476595 something
#> 4 20/09/2022 5.8611553 <NA>
#> 5 20/09/2022 10.9773307 <NA>
#> 6 20/09/2022 16.9890155 <NA>
#> 7 21/09/2022 24.0428161 <NA>
#> 8 21/09/2022 35.1776457 <NA>
#> 9 21/09/2022 40.4551331 <NA>
#> 10 22/09/2022 49.5663783 <NA>
#> 11 22/09/2022 63.9132875 <NA>
#> 12 22/09/2022 64.6766946 <NA>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论