英文:
How to use data.table fifelse with vectors in the arguments?
问题
以下是您要求的代码部分的中文翻译:
# 假设我有这个数据框
DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1),
three=c(NA,NA, 1, NA, 1,NA))
# 列是互斥的
# 我需要生成输出
output=c("one","two","three","one","three", "two")
# 我尝试使用data.table的fifelse,但是出错了
with(DF, fifelse(one==1, "one", fifelse(two==1, "two", "three", na="three"),
na=fifelse(two==1, "two", "three", na="three")))
# 出现错误,似乎不接受参数中的向量
# dplyr的if_else在这里表现良好
with(DF, if_else(one==1, "one", if_else(two==1, "two", "three", missing="three"),
missing=if_else(two==1, "two", "three", missing="three")))
# 如何使用data.table获得相同的输出?
# 还有其他简单的替代方法
# 使用R基础可以这样做
apply(DF,1, function(x) which(!is.na(x)))
# 然后用字符替换这些数字
请注意,以上翻译只包括代码部分,不包括问题的回答。如果您需要进一步的解释或帮助,请随时提出。
英文:
Say I have this data.frame
DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1),
three=c(NA,NA, 1, NA, 1,NA))
one two three output
1 NA NA one
NA 1 NA two
NA NA 1 three
1 NA NA one
NA NA 1 three
NA 1 NA two
The columns are mutually exclusive.
I need to generate the output
output=c("one","two","three","one","three", "two")
I've tried to to it with data.table fifelse but it
with(DF,fifelse(one==1, "one", fifelse(two==1,"two", "three", na="three"),
na=fifelse(two==1,"two", "three", na="three")))
Error in fifelse(one == 1, "one", fifelse(two == 1, "two", "three", na = "three"), :
Length of 'na' is 6 but must be 1
It seems it doesn't accept a vector on the arguments.
dplyr's if_else works well here.
with(DF,if_else(one==1, "one", if_else(two==1,"two", "three", missing="three"),
missing=if_else(two==1,"two", "three", missing="three")))
How can I get the same output with data.table?
Any other simple alternative.
With R base I could use
apply(DF,1, function(x) which(!is.na(x)))
and later replace that numbers with characters.
答案1
得分: 3
Here are the translated code sections:
data.table
library(data.table)
as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
# [1] "one" "two" "three" "one" "three" "two"
dplyr
The dplyr analog is case_when
:
library(dplyr)
with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
# [1] "one" "two" "three" "one" "three" "two"
base R
Both the data.table
and dplyr
implementations presume knowing the column names a priori. A base-R method that is agnostic to that:
colnames(DF)[apply(DF, 1, which.max)]
# [1] "one" "two" "three" "one" "three" "two"
(Incidentally, which.max
can also be which.min
here, really we're just looking for a non-NA
value.)
In this case, if you have other columns that should not be considered, you will need to subset the DF
within apply(DF, ...)
so that it only looks at the desired columns.
英文:
fifelse
isn't the best tool for this, I suggest fcase
is easier:
data.table
library(data.table)
as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
# [1] "one" "two" "three" "one" "three" "two"
dplyr
The dplyr analog is case_when
:
library(dplyr)
with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
# [1] "one" "two" "three" "one" "three" "two"
base R
Both the data.table
and dplyr
implementations presume knowing the column names a priori. A base-R method that is agnostic to that:
colnames(DF)[apply(DF, 1, which.max)]
# [1] "one" "two" "three" "one" "three" "two"
(Incidentally, which.max
can also be which.min
here, really we're just looking for a non-NA
value.)
In this case, if you have other columns that should not be considered, you will need to subset the DF
within apply(DF, ...)
so that it only looks at the desired columns.
答案2
得分: 3
另一种data.table的替代方法:
for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)
英文:
Another data.table alterntive:
for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)
答案3
得分: 2
如果每行只有一个非NA值,可以尝试使用max.col
或col
+ na.omit
来获取列名。进行基准测试时,max.col
的执行时间比col
+ na.omit
短得多。
基准测试
Unit: 微秒
expr 最小 下四分位数 平均值 中位数 上四分位数 最大值 评估次数
f1 28.5 51.45 92.343 64.40 91.8 1532.5 100
f2 300.7 527.65 634.755 595.35 691.5 2405.4 100
英文:
If you have only one non-NA value each row, you can try max.col
> names(DF)[max.col(!is.na(DF))]
[1] "one" "two" "three" "one" "three" "two"
or col
+ na.omit
(but this might be slow if you are pursuing the speed)
> names(DF)[na.omit(c(t(col(DF) * DF)))]
[1] "one" "two" "three" "one" "three" "two"
Benchmarking
microbenchmark(
f1 = names(DF)[max.col(!is.na(DF))],
f2 = names(DF)[na.omit(c(t(col(DF) * DF)))]
)
gives
Unit: microseconds
expr min lq mean median uq max neval
f1 28.5 51.45 92.343 64.40 91.8 1532.5 100
f2 300.7 527.65 634.755 595.35 691.5 2405.4 100
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论