2023年5月24日 18:47:27go评论102阅读模式

英文:

How to use data.table fifelse with vectors in the arguments?

问题

以下是您要求的代码部分的中文翻译：

# 假设我有这个数据框
DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1), 
                 three=c(NA,NA, 1, NA, 1,NA))
# 列是互斥的
# 我需要生成输出
output=c("one","two","three","one","three", "two")
# 我尝试使用data.table的fifelse，但是出错了
with(DF, fifelse(one==1, "one", fifelse(two==1, "two", "three", na="three"), 
                 na=fifelse(two==1, "two", "three", na="three")))
# 出现错误，似乎不接受参数中的向量
# dplyr的if_else在这里表现良好
with(DF, if_else(one==1, "one", if_else(two==1, "two", "three", missing="three"), 
                 missing=if_else(two==1, "two", "three", missing="three")))
# 如何使用data.table获得相同的输出？
# 还有其他简单的替代方法
# 使用R基础可以这样做
apply(DF,1, function(x) which(!is.na(x)))
# 然后用字符替换这些数字

请注意，以上翻译只包括代码部分，不包括问题的回答。如果您需要进一步的解释或帮助，请随时提出。

英文:

Say I have this data.frame

DF &lt;- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1), 
         three=c(NA,NA, 1, NA, 1,NA))
one    two  three         output
  1     NA    NA             one
 NA      1    NA             two
 NA     NA     1           three
  1     NA    NA             one  
 NA     NA     1           three
 NA      1    NA             two

The columns are mutually exclusive.
I need to generate the output

output=c(&quot;one&quot;,&quot;two&quot;,&quot;three&quot;,&quot;one&quot;,&quot;three&quot;, &quot;two&quot;)

I've tried to to it with data.table fifelse but it

with(DF,fifelse(one==1, &quot;one&quot;, fifelse(two==1,&quot;two&quot;, &quot;three&quot;, na=&quot;three&quot;), 
   na=fifelse(two==1,&quot;two&quot;, &quot;three&quot;, na=&quot;three&quot;)))
Error in fifelse(one == 1, &quot;one&quot;, fifelse(two == 1, &quot;two&quot;, &quot;three&quot;, na = &quot;three&quot;),  : 
  Length of &#39;na&#39; is 6 but must be 1

It seems it doesn't accept a vector on the arguments.

dplyr's if_else works well here.

with(DF,if_else(one==1, &quot;one&quot;, if_else(two==1,&quot;two&quot;, &quot;three&quot;, missing=&quot;three&quot;), 
   missing=if_else(two==1,&quot;two&quot;, &quot;three&quot;, missing=&quot;three&quot;)))

How can I get the same output with data.table?

Any other simple alternative.
With R base I could use

apply(DF,1, function(x) which(!is.na(x)))

and later replace that numbers with characters.

答案1

得分: 3

Here are the translated code sections:

data.table

library(data.table)
as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
# [1] "one"   "two"   "three" "one"   "three" "two"

dplyr

The dplyr analog is case_when:

library(dplyr)
with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
# [1] "one"   "two"   "three" "one"   "three" "two"

base R

Both the data.table and dplyr implementations presume knowing the column names a priori. A base-R method that is agnostic to that:

colnames(DF)[apply(DF, 1, which.max)]
# [1] "one"   "two"   "three" "one"   "three" "two"

(Incidentally, which.max can also be which.min here, really we're just looking for a non-NA value.)

In this case, if you have other columns that should not be considered, you will need to subset the DF within apply(DF, ...) so that it only looks at the desired columns.

英文:

fifelse isn't the best tool for this, I suggest fcase is easier:

data.table

library(data.table)
as.data.table(DF)[, fcase(one == 1, &quot;one&quot;, two == 1, &quot;two&quot;, three == 1, &quot;three&quot;)]
# [1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

dplyr

The dplyr analog is case_when:

library(dplyr)
with(DF, case_when(one == 1 ~ &quot;one&quot;, two == 1 ~ &quot;two&quot;, three == 1 ~ &quot;three&quot;))
# [1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

base R

Both the data.table and dplyr implementations presume knowing the column names a priori. A base-R method that is agnostic to that:

colnames(DF)[apply(DF, 1, which.max)]
# [1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

(Incidentally, which.max can also be which.min here, really we're just looking for a non-NA value.)

In this case, if you have other columns that should not be considered, you will need to subset the DF within apply(DF, ...) so that it only looks at the desired columns.

答案2

得分: 3

另一种data.table的替代方法：

for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)

英文:

Another data.table alterntive:

for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = &quot;output&quot;, value = col)

答案3

得分: 2

如果每行只有一个非NA值，可以尝试使用max.col或col + na.omit来获取列名。进行基准测试时，max.col的执行时间比col + na.omit短得多。

基准测试

Unit: 微秒
 expr   最小     下四分位数    平均值  中位数    上四分位数    最大值  评估次数
   f1  28.5  51.45  92.343  64.40  91.8 1532.5   100
   f2 300.7 527.65 634.755 595.35 691.5 2405.4   100

英文:

If you have only one non-NA value each row, you can try max.col

&gt; names(DF)[max.col(!is.na(DF))]
[1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

or col + na.omit (but this might be slow if you are pursuing the speed)

&gt; names(DF)[na.omit(c(t(col(DF) * DF)))]
[1] &quot;one&quot;   &quot;two&quot;   &quot;three&quot; &quot;one&quot;   &quot;three&quot; &quot;two&quot;

Benchmarking

microbenchmark(
    f1 = names(DF)[max.col(!is.na(DF))],
    f2 = names(DF)[na.omit(c(t(col(DF) * DF)))]
)

gives

Unit: microseconds
 expr   min     lq    mean median    uq    max neval
   f1  28.5  51.45  92.343  64.40  91.8 1532.5   100
   f2 300.7 527.65 634.755 595.35 691.5 2405.4   100

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在参数中使用 data.table 的 fifelse 函数与向量？

问题

答案1

data.table

dplyr

base R

data.table

dplyr

base R

答案2

答案3

基准测试

Benchmarking

在R中为Distill和/或Quarto网站创建用户/密码登录。

如何在R中处理具有两行标题的数据框中的数据？

Survminer – 排列多个 ggsurvplot 和 ggadjustedcurves

将 JSON 列表转换为数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论