使用dplyr找到数据框中所有右侧列都为零的最左列的方法。

huangapple go评论55阅读模式
英文:

How to find the leftmost column of a data frame with all zeros in columns to the right with dplyr

问题

以下是您要翻译的内容:

I have a data frame like this:

df <- data.frame(v1=c(0,1), v2=c(0,1),v3=c(0,1), v4=c(0,1) ) %>% 
    expand(v1,v2,v3,v4)
df

# A tibble: 16 x 4
    var1  var2  var3  var4
   <dbl> <dbl> <dbl> <dbl>
 1     0     0     0     0
 2     0     0     0     1
 3     0     0     1     0
 4     0     0     1     1
 5     0     1     0     0
 6     0     1     0     1
 7     0     1     1     0
 8     0     1     1     1
 9     1     0     0     0
10     1     0     0     1
11     1     0     1     0
12     1     0     1     1
13     1     1     0     0
14     1     1     0     1
15     1     1     1     0
16     1     1     1     1

For each row, I would like to identify the leftmost column that contains a zero, such that all columns to the right also contain 0s. The expected output is as follows:

# A tibble: 16 x 5
    var1  var2  var3  var4 result
   <dbl> <dbl> <dbl> <dbl> <chr> 
 1     0     0     0     0 var1  
 2     0     0     0     1 NA    
 3     0     0     1     0 var4  
 4     0     0     1     1 NA    
 5     0     1     0     0 var3  
 6     0     1     0     1 NA    
 7     0     1     1     0 var4  
 8     0     1     1     1 NA    
 9     1     0     0     0 var2  
10     1     0     0     1 NA    
11     1     0     1     0 var4  
12     1     0     1     1 NA    
13     1     1     0     0 var3  
14     1     1     0     1 NA    
15     1     1     1     0 var4  
16     1     1     1     1 NA    

If possible, I would prefer a tidyverse solution.


<details>
<summary>英文:</summary>

I have a data frame like this:

df <- data.frame(v1=c(0,1), v2=c(0,1),v3=c(0,1), v4=c(0,1) ) %>%
expand(v1,v2,v3,v4)
df

A tibble: 16 x 4

var1  var2  var3  var4

<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0 0 0 1
3 0 0 1 0
4 0 0 1 1
5 0 1 0 0
6 0 1 0 1
7 0 1 1 0
8 0 1 1 1
9 1 0 0 0
10 1 0 0 1
11 1 0 1 0
12 1 0 1 1
13 1 1 0 0
14 1 1 0 1
15 1 1 1 0
16 1 1 1 1




For each row, I would like to identify the leftmost column that contains a zero, such that all columns to the right also contain 0s. The expected output is as follows:


A tibble: 16 x 5

var1  var2  var3  var4 result

<dbl> <dbl> <dbl> <dbl> <chr>
1 0 0 0 0 var1
2 0 0 0 1 NA
3 0 0 1 0 var4
4 0 0 1 1 NA
5 0 1 0 0 var3
6 0 1 0 1 NA
7 0 1 1 0 var4
8 0 1 1 1 NA
9 1 0 0 0 var2
10 1 0 0 1 NA
11 1 0 1 0 var4
12 1 0 1 1 NA
13 1 1 0 0 var3
14 1 1 0 1 NA
15 1 1 1 0 var4
16 1 1 1 1 NA


If possible, I would prefer a tidyverse solution.

</details>


# 答案1
**得分**: 3

这是您提供的代码的中文翻译部分:

```R
我对tidyverse解决方案一无所知,但这里有一个基本的R答案:

chk <- simplify2array(rev(Reduce(`+`, rev(df), accumulate=TRUE))) == 0
df$name <- names(df)[max.col(chk, "first")]
df$name[rowSums(chk) == 0] <- NA
df

这段代码的作用是将数据框df中的列进行一些操作,然后将结果存储在df的"name"列中。

英文:

I've no idea about a tidyverse solution, but here's a base R answer:

chk &lt;- simplify2array(rev(Reduce(`+`, rev(df), accumulate=TRUE))) == 0
df$name &lt;- names(df)[max.col(chk, &quot;first&quot;)]
df$name[rowSums(chk) == 0] &lt;- NA
df

### A tibble: 16 &#215; 5
##      v1    v2    v3    v4 name 
##   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
## 1     0     0     0     0 v1   
## 2     0     0     0     1 NA   
## 3     0     0     1     0 v4   
## 4     0     0     1     1 NA   
## 5     0     1     0     0 v3   
## 6     0     1     0     1 NA   
## 7     0     1     1     0 v4   
## 8     0     1     1     1 NA   
## 9     1     0     0     0 v2   
##10     1     0     0     1 NA   
##11     1     0     1     0 v4   
##12     1     0     1     1 NA   
##13     1     1     0     0 v3   
##14     1     1     0     1 NA   
##15     1     1     1     0 v4   
##16     1     1     1     1 NA   

答案2

得分: 3

这个解决方案将每一行的数值传递给一个辅助函数,使用 dplyr::c_across();然后辅助函数使用 rle() 来返回最后一组零值的起始位置,如果最后一个值是零,否则返回 NA

library(dplyr)

find_zero <- function(x, cols) {
  x_rle <- rle(x)
  if (tail(x_rle$values, 1) != 0) NA
  else rev(cols)[[tail(x_rle$lengths, 1)]]
}

df %>%
  rowwise() %>%
  mutate(result = find_zero(c_across(v1:v4), names(.))) %>%
  ungroup()
# A tibble: 16 × 5
      v1    v2    v3    v4 result
   <dbl> <dbl> <dbl> <dbl> <chr> 
 1     0     0     0     0 v1    
 2     0     0     0     1 <NA>  
 3     0     0     1     0 v4    
 4     0     0     1     1 <NA>  
 5     0     1     0     0 v3    
 6     0     1     0     1 <NA>  
 7     0     1     1     0 v4    
 8     0     1     1     1 <NA>  
 9     1     0     0     0 v2    
10     1     0     0     1 <NA>  
11     1     0     1     0 v4    
12     1     0     1     1 <NA>  
13     1     1     0     0 v3    
14     1     1     0     1 <NA>  
15     1     1     1     0 v4    
16     1     1     1     1 <NA>  
英文:

This solution passes each row of values to a helper function using dplyr::c_across(); the helper function then uses rle() to return the start of the last run of zeros, if the last value is zero, and NA otherwise.

library(dplyr)

find_zero &lt;- function(x, cols) {
  x_rle &lt;- rle(x)
  if (tail(x_rle$values, 1) != 0) NA
  else rev(cols)[[tail(x_rle$lengths, 1)]]
}

df %&gt;%
  rowwise() %&gt;%
  mutate(result = find_zero(c_across(v1:v4), names(.))) %&gt;%
  ungroup()
# A tibble: 16 &#215; 5
      v1    v2    v3    v4 result
   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; 
 1     0     0     0     0 v1    
 2     0     0     0     1 &lt;NA&gt;  
 3     0     0     1     0 v4    
 4     0     0     1     1 &lt;NA&gt;  
 5     0     1     0     0 v3    
 6     0     1     0     1 &lt;NA&gt;  
 7     0     1     1     0 v4    
 8     0     1     1     1 &lt;NA&gt;  
 9     1     0     0     0 v2    
10     1     0     0     1 &lt;NA&gt;  
11     1     0     1     0 v4    
12     1     0     1     1 &lt;NA&gt;  
13     1     1     0     0 v3    
14     1     1     0     1 &lt;NA&gt;  
15     1     1     1     0 v4    
16     1     1     1     1 &lt;NA&gt;  

huangapple
  • 本文由 发表于 2023年2月24日 10:27:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552111.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定