如何在R中根据列中的条件填充NA行

huangapple go评论72阅读模式
英文:

How to fill NA rows by conditions from columns in R

问题

这是一个示例:

df <- data.frame(v1=rep(1:2, 4), 
               v2=rep(c("a", "b"), each=4), 
               v3=paste0(rep(1:2, each=4), rep(c("m", "n", "o", "p"), each=2)), 
               v4=c(1,2, NA, NA, 3,4, NA,NA),
               v5=c(5,6, NA, NA, 7,8, NA,NA),
               v6=c(9,10, NA, NA, 11,12, NA,NA))

df

这是我想要的,如果列v1+v2+v3相同,忽略v3的最后一个字母,那么将从不是NA的行中填充NAs。在这种情况下,由于1a1相同(忽略m),所以行3的NA应该由行1填充。所以期望的输出是:

  v1 v2 v3 v4 v5 v6
1  1  a 1m  1  5  9
2  2  a 1m  2  6 10
3  1  a 1n  1  5  9
4  2  a 1n  2  6 10
5  1  b 2o  3  7 11
6  2  b 2o  4  8 12
7  1  b 2p  3  7 11
8  2  b 2p  4  8 12
英文:

Here is an example:

df&lt;-data.frame(v1=rep(1:2, 4), 
               v2=rep(c(&quot;a&quot;, &quot;b&quot;), each=4), 
               v3=paste0(rep(1:2, each=4), rep(c(&quot;m&quot;, &quot;n&quot;, &quot;o&quot;, &quot;p&quot;), each=2)), 
               v4=c(1,2, NA, NA, 3,4, NA,NA),
               v5=c(5,6, NA, NA, 7,8, NA,NA),
               v6=c(9,10, NA, NA, 11,12, NA,NA))

df
  v1 v2 v3 v4 v5 v6
1  1  a 1m  1  5  9
2  2  a 1m  2  6 10
3  1  a 1n NA NA NA
4  2  a 1n NA NA NA
5  1  b 2o  3  7 11
6  2  b 2o  4  8 12
7  1  b 2p NA NA NA
8  2  b 2p NA NA NA

What I wanted is, if column v1+v2+v3 are same by ignore the last letter of v3, fill the NAs from the rows that are not NA . In this case, row3's NA should be filled by row1 due to same 1a1 by ignoring m. So a desired output would be:

  v1 v2 v3 v4 v5 v6
1  1  a 1m  1  5  9
2  2  a 1m  2  6 10
3  1  a 1n  1  5  9
4  2  a 1n  2  6 10
5  1  b 2o  3  7 11
6  2  b 2o  4  8 12
7  1  b 2p  3  7 11
8  2  b 2p  4  8 12

答案1

得分: 2

我不知道,但我认为这是生成您的结果的更简单方法

    library(tidyverse)
    df %>%
      group_by(v1,v2) %>%
      fill(v4:v6)

# 添加v3逻辑

    df %>%
      mutate(v7 = v3 %>%
      as.character() %>%
      parse_number()) %>%
      group_by(v1,v2,v7) %>%
      fill(v4:v6) %>%
      select(-v7)
英文:

I don't know but I think this is a simpler way of producing your results

library(tidyverse)
df %&gt;% 
  group_by(v1,v2) %&gt;% 
  fill(v4:v6)

Adding the v3 logic

df %&gt;%
  mutate(v7 = v3 %&gt;% as.character() %&gt;%  parse_number()) %&gt;% 
  group_by(v1,v2,v7) %&gt;% 
  fill(v4:v6) %&gt;% 
  select(-v7)

答案2

得分: 1

这是一个将 v3 重新编码为仅考虑数字部分的变量的解决方案。

library(dplyr)
library(stringr)

# 提取字符串 v3 中的数字部分
df$v7 <- str_extract(df$v3, "[[:digit:]]+")

df %>%
  group_by(v1, v2, v7) %>%
  fill(v4:v6)
英文:

Here is a solution that recodes v3 into a variable that only takes into account the numeric part.

library(dplyr)
library(stringr)

#Extract numeric part of the string in v3
df$v7&lt;-str_extract(df$v3,&quot;[[:digit:]]+&quot;)

df %&gt;%
  group_by(v1,v2,v7) %&gt;% 
  fill(v4:v6)

答案3

得分: 0

使用zoo中的na.locf

library(zoo)
library(data.table)
setDT(df)[, na.locf(.SD), .(v1, v2)]
#    v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  1  a 1n  1  5  9
#3:  2  a 1m  2  6 10
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  1  b 2p  3  7 11
#7:  2  b 2o  4  8 12
#8:  2  b 2p  4  8 12

如果我们想要在'v3'中添加条件

setDT(df)[, names(df)[4:6] := na.locf(.SD), .(v1, v2, sub("\\D+", "", v3))][]
#   v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  2  a 1m  2  6 10
#3:  1  a 1n  1  5  9
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  2  b 2o  4  8 12
#7:  1  b 2p  3  7 11
#8:  2  b 2p  4  8 12
英文:

Using na.locf from zoo

library(zoo)
library(data.table)
setDT(df)[, na.locf(.SD),.(v1, v2)]
#    v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  1  a 1n  1  5  9
#3:  2  a 1m  2  6 10
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  1  b 2p  3  7 11
#7:  2  b 2o  4  8 12
#8:  2  b 2p  4  8 12

If we want to add the condition in 'v3'

setDT(df)[, names(df)[4:6] := na.locf(.SD),.(v1, v2, sub(&quot;\\D+&quot;, &quot;&quot;, v3))][]
#   v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  2  a 1m  2  6 10
#3:  1  a 1n  1  5  9
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  2  b 2o  4  8 12
#7:  1  b 2p  3  7 11
#8:  2  b 2p  4  8 12

答案4

得分: 0

以下是使用 data.tablezoo 的解决方案,忽略了 v3 列的最后一个字母:

library(data.table)
setDT(df)[, match_cols := paste0(v1, v2, substr(v3, 1, nchar(as.character(v3)) - 1))][, id := .GRP, by = match_cols][, v4 := zoo::na.locf(v4, na.rm = F), by = id][, v5 := zoo::na.locf(v5, na.rm = F), by = id][, v6 := zoo::na.locf(v6, na.rm = F), by = id][ , c("match_cols", "id") := NULL]
df

结果如下:

   v1 v2 v3 v4 v5 v6
1:  1  a 1m  1  5  9
2:  2  a 1m  2  6 10
3:  1  a 1n  1  5  9
4:  2  a 1n  2  6 10
5:  1  b 2o  3  7 11
6:  2  b 2o  4  8 12
7:  1  b 2p  3  7 11
8:  2  b 2p  4  8 12
英文:

Here's a solution using data.table and zoo which ignores v3 column's last letter:

library(data.table)
setDT(df)[, match_cols := paste0(v1, v2, substr(v3, 1, nchar(as.character(v3)) - 1))][, id := .GRP, by = match_cols][, v4 := zoo::na.locf(v4, na.rm = F), by = id][, v5 := zoo::na.locf(v5, na.rm = F), by = id][, v6 := zoo::na.locf(v6, na.rm = F), by = id][ , c(&quot;match_cols&quot;, &quot;id&quot;) := NULL]
df

#    v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  2  a 1m  2  6 10
#3:  1  a 1n  1  5  9
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  2  b 2o  4  8 12
#7:  1  b 2p  3  7 11
#8:  2  b 2p  4  8 12

huangapple
  • 本文由 发表于 2020年1月4日 01:00:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582451.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定