如何在参数中使用 data.table 的 fifelse 函数与向量?

huangapple go评论64阅读模式
英文:

How to use data.table fifelse with vectors in the arguments?

问题

以下是您要求的代码部分的中文翻译:

# 假设我有这个数据框
DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1), 
                 three=c(NA,NA, 1, NA, 1,NA))

# 列是互斥的
# 我需要生成输出
output=c("one","two","three","one","three", "two")

# 我尝试使用data.table的fifelse,但是出错了
with(DF, fifelse(one==1, "one", fifelse(two==1, "two", "three", na="three"), 
                 na=fifelse(two==1, "two", "three", na="three")))

# 出现错误,似乎不接受参数中的向量
# dplyr的if_else在这里表现良好
with(DF, if_else(one==1, "one", if_else(two==1, "two", "three", missing="three"), 
                 missing=if_else(two==1, "two", "three", missing="three")))

# 如何使用data.table获得相同的输出?
# 还有其他简单的替代方法
# 使用R基础可以这样做
apply(DF,1, function(x) which(!is.na(x)))
# 然后用字符替换这些数字

请注意,以上翻译只包括代码部分,不包括问题的回答。如果您需要进一步的解释或帮助,请随时提出。

英文:

Say I have this data.frame

DF &lt;- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1), 
         three=c(NA,NA, 1, NA, 1,NA))

one    two  three         output
  1     NA    NA             one
 NA      1    NA             two
 NA     NA     1           three
  1     NA    NA             one  
 NA     NA     1           three
 NA      1    NA             two

The columns are mutually exclusive.
I need to generate the output

output=c(&quot;one&quot;,&quot;two&quot;,&quot;three&quot;,&quot;one&quot;,&quot;three&quot;, &quot;two&quot;)

I've tried to to it with data.table fifelse but it

with(DF,fifelse(one==1, &quot;one&quot;, fifelse(two==1,&quot;two&quot;, &quot;three&quot;, na=&quot;three&quot;), 
   na=fifelse(two==1,&quot;two&quot;, &quot;three&quot;, na=&quot;three&quot;)))

Error in fifelse(one == 1, &quot;one&quot;, fifelse(two == 1, &quot;two&quot;, &quot;three&quot;, na = &quot;three&quot;),  : 
  Length of &#39;na&#39; is 6 but must be 1

It seems it doesn't accept a vector on the arguments.

dplyr's if_else works well here.

with(DF,if_else(one==1, &quot;one&quot;, if_else(two==1,&quot;two&quot;, &quot;three&quot;, missing=&quot;three&quot;), 
   missing=if_else(two==1,&quot;two&quot;, &quot;three&quot;, missing=&quot;three&quot;)))

How can I get the same output with data.table?

Any other simple alternative.
With R base I could use

apply(DF,1, function(x) which(!is.na(x)))

and later replace that numbers with characters.

答案1

得分: 3

Here are the translated code sections:

data.table

library(data.table)
as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
# [1] "one"   "two"   "three" "one"   "three" "two"  

dplyr

The dplyr analog is case_when:

library(dplyr)
with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
# [1] "one"   "two"   "three" "one"   "three" "two"  

base R

Both the data.table and dplyr implementations presume knowing the column names a priori. A base-R method that is agnostic to that:

colnames(DF)[apply(DF, 1, which.max)]
# [1] "one"   "two"   "three" "one"   "three" "two"  

(Incidentally, which.max can also be which.min here, really we're just looking for a non-NA value.)

In this case, if you have other columns that should not be considered, you will need to subset the DF within apply(DF, ...) so that it only looks at the desired columns.

英文:

fifelse isn't the best tool for this, I suggest fcase is easier:

data.table

library(data.table)
as.data.table(DF)[, fcase(one == 1, &quot;one&quot;, two == 1, &quot;two&quot;, three == 1, &quot;three&quot;)]
# [1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;  

dplyr

The dplyr analog is case_when:

library(dplyr)
with(DF, case_when(one == 1 ~ &quot;one&quot;, two == 1 ~ &quot;two&quot;, three == 1 ~ &quot;three&quot;))
# [1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;  

base R

Both the data.table and dplyr implementations presume knowing the column names a priori. A base-R method that is agnostic to that:

colnames(DF)[apply(DF, 1, which.max)]
# [1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;  

(Incidentally, which.max can also be which.min here, really we're just looking for a non-NA value.)

In this case, if you have other columns that should not be considered, you will need to subset the DF within apply(DF, ...) so that it only looks at the desired columns.

答案2

得分: 3

另一种data.table的替代方法:

for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)
英文:

Another data.table alterntive:

for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = &quot;output&quot;, value = col)

答案3

得分: 2

如果每行只有一个非NA值,可以尝试使用max.colcol + na.omit来获取列名。进行基准测试时,max.col的执行时间比col + na.omit短得多。

基准测试

Unit: 微秒
 expr   最小     下四分位数    平均值  中位数    上四分位数    最大值  评估次数
   f1  28.5  51.45  92.343  64.40  91.8 1532.5   100
   f2 300.7 527.65 634.755 595.35 691.5 2405.4   100
英文:

If you have only one non-NA value each row, you can try max.col

&gt; names(DF)[max.col(!is.na(DF))]
[1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

or col + na.omit (but this might be slow if you are pursuing the speed)

&gt; names(DF)[na.omit(c(t(col(DF) * DF)))]
[1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

Benchmarking

microbenchmark(
    f1 = names(DF)[max.col(!is.na(DF))],
    f2 = names(DF)[na.omit(c(t(col(DF) * DF)))]
)

gives

Unit: microseconds
 expr   min     lq    mean median    uq    max neval
   f1  28.5  51.45  92.343  64.40  91.8 1532.5   100
   f2 300.7 527.65 634.755 595.35 691.5 2405.4   100

huangapple
  • 本文由 发表于 2023年5月24日 18:47:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76322685.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定