如何在参数中使用 data.table 的 fifelse 函数与向量?

huangapple go评论102阅读模式
英文:

How to use data.table fifelse with vectors in the arguments?

问题

以下是您要求的代码部分的中文翻译:

  1. # 假设我有这个数据框
  2. DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1),
  3. three=c(NA,NA, 1, NA, 1,NA))
  4. # 列是互斥的
  5. # 我需要生成输出
  6. output=c("one","two","three","one","three", "two")
  7. # 我尝试使用data.table的fifelse,但是出错了
  8. with(DF, fifelse(one==1, "one", fifelse(two==1, "two", "three", na="three"),
  9. na=fifelse(two==1, "two", "three", na="three")))
  10. # 出现错误,似乎不接受参数中的向量
  11. # dplyr的if_else在这里表现良好
  12. with(DF, if_else(one==1, "one", if_else(two==1, "two", "three", missing="three"),
  13. missing=if_else(two==1, "two", "three", missing="three")))
  14. # 如何使用data.table获得相同的输出?
  15. # 还有其他简单的替代方法
  16. # 使用R基础可以这样做
  17. apply(DF,1, function(x) which(!is.na(x)))
  18. # 然后用字符替换这些数字

请注意,以上翻译只包括代码部分,不包括问题的回答。如果您需要进一步的解释或帮助,请随时提出。

英文:

Say I have this data.frame

  1. DF &lt;- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1),
  2. three=c(NA,NA, 1, NA, 1,NA))
  3. one two three output
  4. 1 NA NA one
  5. NA 1 NA two
  6. NA NA 1 three
  7. 1 NA NA one
  8. NA NA 1 three
  9. NA 1 NA two

The columns are mutually exclusive.
I need to generate the output

  1. output=c(&quot;one&quot;,&quot;two&quot;,&quot;three&quot;,&quot;one&quot;,&quot;three&quot;, &quot;two&quot;)

I've tried to to it with data.table fifelse but it

  1. with(DF,fifelse(one==1, &quot;one&quot;, fifelse(two==1,&quot;two&quot;, &quot;three&quot;, na=&quot;three&quot;),
  2. na=fifelse(two==1,&quot;two&quot;, &quot;three&quot;, na=&quot;three&quot;)))
  3. Error in fifelse(one == 1, &quot;one&quot;, fifelse(two == 1, &quot;two&quot;, &quot;three&quot;, na = &quot;three&quot;), :
  4. Length of &#39;na&#39; is 6 but must be 1

It seems it doesn't accept a vector on the arguments.

dplyr's if_else works well here.

  1. with(DF,if_else(one==1, &quot;one&quot;, if_else(two==1,&quot;two&quot;, &quot;three&quot;, missing=&quot;three&quot;),
  2. missing=if_else(two==1,&quot;two&quot;, &quot;three&quot;, missing=&quot;three&quot;)))

How can I get the same output with data.table?

Any other simple alternative.
With R base I could use

  1. apply(DF,1, function(x) which(!is.na(x)))

and later replace that numbers with characters.

答案1

得分: 3

Here are the translated code sections:

data.table

  1. library(data.table)
  2. as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
  3. # [1] "one" "two" "three" "one" "three" "two"

dplyr

The dplyr analog is case_when:

  1. library(dplyr)
  2. with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
  3. # [1] "one" "two" "three" "one" "three" "two"

base R

Both the data.table and dplyr implementations presume knowing the column names a priori. A base-R method that is agnostic to that:

  1. colnames(DF)[apply(DF, 1, which.max)]
  2. # [1] "one" "two" "three" "one" "three" "two"

(Incidentally, which.max can also be which.min here, really we're just looking for a non-NA value.)

In this case, if you have other columns that should not be considered, you will need to subset the DF within apply(DF, ...) so that it only looks at the desired columns.

英文:

fifelse isn't the best tool for this, I suggest fcase is easier:

data.table

  1. library(data.table)
  2. as.data.table(DF)[, fcase(one == 1, &quot;one&quot;, two == 1, &quot;two&quot;, three == 1, &quot;three&quot;)]
  3. # [1] &quot;one&quot; &quot;two&quot; &quot;three&quot; &quot;one&quot; &quot;three&quot; &quot;two&quot;

dplyr

The dplyr analog is case_when:

  1. library(dplyr)
  2. with(DF, case_when(one == 1 ~ &quot;one&quot;, two == 1 ~ &quot;two&quot;, three == 1 ~ &quot;three&quot;))
  3. # [1] &quot;one&quot; &quot;two&quot; &quot;three&quot; &quot;one&quot; &quot;three&quot; &quot;two&quot;

base R

Both the data.table and dplyr implementations presume knowing the column names a priori. A base-R method that is agnostic to that:

  1. colnames(DF)[apply(DF, 1, which.max)]
  2. # [1] &quot;one&quot; &quot;two&quot; &quot;three&quot; &quot;one&quot; &quot;three&quot; &quot;two&quot;

(Incidentally, which.max can also be which.min here, really we're just looking for a non-NA value.)

In this case, if you have other columns that should not be considered, you will need to subset the DF within apply(DF, ...) so that it only looks at the desired columns.

答案2

得分: 3

另一种data.table的替代方法:

  1. for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)
英文:

Another data.table alterntive:

  1. for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = &quot;output&quot;, value = col)

答案3

得分: 2

如果每行只有一个非NA值,可以尝试使用max.colcol + na.omit来获取列名。进行基准测试时,max.col的执行时间比col + na.omit短得多。

基准测试

  1. Unit: 微秒
  2. expr 最小 下四分位数 平均值 中位数 上四分位数 最大值 评估次数
  3. f1 28.5 51.45 92.343 64.40 91.8 1532.5 100
  4. f2 300.7 527.65 634.755 595.35 691.5 2405.4 100
英文:

If you have only one non-NA value each row, you can try max.col

  1. &gt; names(DF)[max.col(!is.na(DF))]
  2. [1] &quot;one&quot; &quot;two&quot; &quot;three&quot; &quot;one&quot; &quot;three&quot; &quot;two&quot;

or col + na.omit (but this might be slow if you are pursuing the speed)

  1. &gt; names(DF)[na.omit(c(t(col(DF) * DF)))]
  2. [1] &quot;one&quot; &quot;two&quot; &quot;three&quot; &quot;one&quot; &quot;three&quot; &quot;two&quot;

Benchmarking

  1. microbenchmark(
  2. f1 = names(DF)[max.col(!is.na(DF))],
  3. f2 = names(DF)[na.omit(c(t(col(DF) * DF)))]
  4. )

gives

  1. Unit: microseconds
  2. expr min lq mean median uq max neval
  3. f1 28.5 51.45 92.343 64.40 91.8 1532.5 100
  4. f2 300.7 527.65 634.755 595.35 691.5 2405.4 100

huangapple
  • 本文由 发表于 2023年5月24日 18:47:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76322685.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定