使用dplyr找到数据框中所有右侧列都为零的最左列的方法。

huangapple go评论92阅读模式
英文:

How to find the leftmost column of a data frame with all zeros in columns to the right with dplyr

问题

以下是您要翻译的内容:

  1. I have a data frame like this:
  2. df <- data.frame(v1=c(0,1), v2=c(0,1),v3=c(0,1), v4=c(0,1) ) %>%
  3. expand(v1,v2,v3,v4)
  4. df
  5. # A tibble: 16 x 4
  6. var1 var2 var3 var4
  7. <dbl> <dbl> <dbl> <dbl>
  8. 1 0 0 0 0
  9. 2 0 0 0 1
  10. 3 0 0 1 0
  11. 4 0 0 1 1
  12. 5 0 1 0 0
  13. 6 0 1 0 1
  14. 7 0 1 1 0
  15. 8 0 1 1 1
  16. 9 1 0 0 0
  17. 10 1 0 0 1
  18. 11 1 0 1 0
  19. 12 1 0 1 1
  20. 13 1 1 0 0
  21. 14 1 1 0 1
  22. 15 1 1 1 0
  23. 16 1 1 1 1

For each row, I would like to identify the leftmost column that contains a zero, such that all columns to the right also contain 0s. The expected output is as follows:

  1. # A tibble: 16 x 5
  2. var1 var2 var3 var4 result
  3. <dbl> <dbl> <dbl> <dbl> <chr>
  4. 1 0 0 0 0 var1
  5. 2 0 0 0 1 NA
  6. 3 0 0 1 0 var4
  7. 4 0 0 1 1 NA
  8. 5 0 1 0 0 var3
  9. 6 0 1 0 1 NA
  10. 7 0 1 1 0 var4
  11. 8 0 1 1 1 NA
  12. 9 1 0 0 0 var2
  13. 10 1 0 0 1 NA
  14. 11 1 0 1 0 var4
  15. 12 1 0 1 1 NA
  16. 13 1 1 0 0 var3
  17. 14 1 1 0 1 NA
  18. 15 1 1 1 0 var4
  19. 16 1 1 1 1 NA

If possible, I would prefer a tidyverse solution.

  1. <details>
  2. <summary>英文:</summary>
  3. I have a data frame like this:

df <- data.frame(v1=c(0,1), v2=c(0,1),v3=c(0,1), v4=c(0,1) ) %>%
expand(v1,v2,v3,v4)
df

A tibble: 16 x 4

  1. var1 var2 var3 var4

<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0 0 0 1
3 0 0 1 0
4 0 0 1 1
5 0 1 0 0
6 0 1 0 1
7 0 1 1 0
8 0 1 1 1
9 1 0 0 0
10 1 0 0 1
11 1 0 1 0
12 1 0 1 1
13 1 1 0 0
14 1 1 0 1
15 1 1 1 0
16 1 1 1 1

  1. For each row, I would like to identify the leftmost column that contains a zero, such that all columns to the right also contain 0s. The expected output is as follows:

A tibble: 16 x 5

  1. var1 var2 var3 var4 result

<dbl> <dbl> <dbl> <dbl> <chr>
1 0 0 0 0 var1
2 0 0 0 1 NA
3 0 0 1 0 var4
4 0 0 1 1 NA
5 0 1 0 0 var3
6 0 1 0 1 NA
7 0 1 1 0 var4
8 0 1 1 1 NA
9 1 0 0 0 var2
10 1 0 0 1 NA
11 1 0 1 0 var4
12 1 0 1 1 NA
13 1 1 0 0 var3
14 1 1 0 1 NA
15 1 1 1 0 var4
16 1 1 1 1 NA

  1. If possible, I would prefer a tidyverse solution.
  2. </details>
  3. # 答案1
  4. **得分**: 3
  5. 这是您提供的代码的中文翻译部分:
  6. ```R
  7. 我对tidyverse解决方案一无所知,但这里有一个基本的R答案:
  8. chk <- simplify2array(rev(Reduce(`+`, rev(df), accumulate=TRUE))) == 0
  9. df$name <- names(df)[max.col(chk, "first")]
  10. df$name[rowSums(chk) == 0] <- NA
  11. df

这段代码的作用是将数据框df中的列进行一些操作,然后将结果存储在df的"name"列中。

英文:

I've no idea about a tidyverse solution, but here's a base R answer:

  1. chk &lt;- simplify2array(rev(Reduce(`+`, rev(df), accumulate=TRUE))) == 0
  2. df$name &lt;- names(df)[max.col(chk, &quot;first&quot;)]
  3. df$name[rowSums(chk) == 0] &lt;- NA
  4. df
  5. ### A tibble: 16 &#215; 5
  6. ## v1 v2 v3 v4 name
  7. ## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
  8. ## 1 0 0 0 0 v1
  9. ## 2 0 0 0 1 NA
  10. ## 3 0 0 1 0 v4
  11. ## 4 0 0 1 1 NA
  12. ## 5 0 1 0 0 v3
  13. ## 6 0 1 0 1 NA
  14. ## 7 0 1 1 0 v4
  15. ## 8 0 1 1 1 NA
  16. ## 9 1 0 0 0 v2
  17. ##10 1 0 0 1 NA
  18. ##11 1 0 1 0 v4
  19. ##12 1 0 1 1 NA
  20. ##13 1 1 0 0 v3
  21. ##14 1 1 0 1 NA
  22. ##15 1 1 1 0 v4
  23. ##16 1 1 1 1 NA

答案2

得分: 3

这个解决方案将每一行的数值传递给一个辅助函数,使用 dplyr::c_across();然后辅助函数使用 rle() 来返回最后一组零值的起始位置,如果最后一个值是零,否则返回 NA

  1. library(dplyr)
  2. find_zero <- function(x, cols) {
  3. x_rle <- rle(x)
  4. if (tail(x_rle$values, 1) != 0) NA
  5. else rev(cols)[[tail(x_rle$lengths, 1)]]
  6. }
  7. df %>%
  8. rowwise() %>%
  9. mutate(result = find_zero(c_across(v1:v4), names(.))) %>%
  10. ungroup()
  1. # A tibble: 16 × 5
  2. v1 v2 v3 v4 result
  3. <dbl> <dbl> <dbl> <dbl> <chr>
  4. 1 0 0 0 0 v1
  5. 2 0 0 0 1 <NA>
  6. 3 0 0 1 0 v4
  7. 4 0 0 1 1 <NA>
  8. 5 0 1 0 0 v3
  9. 6 0 1 0 1 <NA>
  10. 7 0 1 1 0 v4
  11. 8 0 1 1 1 <NA>
  12. 9 1 0 0 0 v2
  13. 10 1 0 0 1 <NA>
  14. 11 1 0 1 0 v4
  15. 12 1 0 1 1 <NA>
  16. 13 1 1 0 0 v3
  17. 14 1 1 0 1 <NA>
  18. 15 1 1 1 0 v4
  19. 16 1 1 1 1 <NA>
英文:

This solution passes each row of values to a helper function using dplyr::c_across(); the helper function then uses rle() to return the start of the last run of zeros, if the last value is zero, and NA otherwise.

  1. library(dplyr)
  2. find_zero &lt;- function(x, cols) {
  3. x_rle &lt;- rle(x)
  4. if (tail(x_rle$values, 1) != 0) NA
  5. else rev(cols)[[tail(x_rle$lengths, 1)]]
  6. }
  7. df %&gt;%
  8. rowwise() %&gt;%
  9. mutate(result = find_zero(c_across(v1:v4), names(.))) %&gt;%
  10. ungroup()
  1. # A tibble: 16 &#215; 5
  2. v1 v2 v3 v4 result
  3. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
  4. 1 0 0 0 0 v1
  5. 2 0 0 0 1 &lt;NA&gt;
  6. 3 0 0 1 0 v4
  7. 4 0 0 1 1 &lt;NA&gt;
  8. 5 0 1 0 0 v3
  9. 6 0 1 0 1 &lt;NA&gt;
  10. 7 0 1 1 0 v4
  11. 8 0 1 1 1 &lt;NA&gt;
  12. 9 1 0 0 0 v2
  13. 10 1 0 0 1 &lt;NA&gt;
  14. 11 1 0 1 0 v4
  15. 12 1 0 1 1 &lt;NA&gt;
  16. 13 1 1 0 0 v3
  17. 14 1 1 0 1 &lt;NA&gt;
  18. 15 1 1 1 0 v4
  19. 16 1 1 1 1 &lt;NA&gt;

huangapple
  • 本文由 发表于 2023年2月24日 10:27:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552111.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定