匹配并合并多个不同长度的向量到一个数据框中。

huangapple go评论66阅读模式
英文:

r match and combine into a dataframe multiple vectors of different lengths

问题

结果:

     x    y    z
[1,] "1"  "1"  NA 
[2,] "2"  "2"  NA 
[3,] NA   "3"  "3"
[4,] NA   "4"  "4"
[5,] NA   "5"  "5"
[6,] NA   "6"  "6"
[7,] "a"  NA   "a"
[8,] "b"  "b"  NA
英文:

I am probably missing something very obvious, but I can't seem to find a way to do this. I would like to will merge multiple vectors (or dataframes?) of different lengths into a dataframe by matching values of vector elements with each other and putting them into same row positions, filling rows left empty with NAs. I have tried the solution from qpcR (cbind.na) but it doesn't produce expected outcome.

reproducible example:

x<-c("1","2","a","b")
y<-c("1","2","3","4","5","6","b")
z<-c("3","4","5","6","a")

expected output:

x  y  z
[1,]  1 1 NA
[2,]  2 2 NA
[3,] NA 3 3
[4,] NA 4 4
[5,] NA 5 5
[6,] NA 6 6
[7,] a NA a
[8,] b b NA

答案1

得分: 2

以下是翻译好的内容:

这是一个笨拙但有效的解决方案。请注意,行顺序与您的要求不匹配:

x <- c("1", "2", "a", "b")
y <- c("1", "2", "3", "4", "5", "6", "b")
z <- c("3", "4", "5", "6", "a")

tot <- unique(c(x, y, z))
# 给你一个包含所有向量中所有唯一值的列表

df <- data.frame(
  x = rep(NA, times = length(tot)),
  y = NA,
  z = NA
)
# 准备一个包含所有NA值的数据框

df$x[tot %in% x] <- tot[tot %in% x]
df$y[tot %in% y] <- tot[tot %in% y]
df$z[tot %in% z] <- tot[tot %in% z]
# 如果在“父”向量中存在匹配值,则填充NA值。

结果:

> df
     x    y    z
1    1    1 <NA>
2    2    2 <NA>
3    a <NA>    a
4    b    b <NA>
5 <NA>    3    3
6 <NA>    4    4
7 <NA>    5    5
8 <NA>    6    6
英文:

Here's a clumsy but working solution. Note the row order does not match your request though:

x&lt;-c(&quot;1&quot;,&quot;2&quot;,&quot;a&quot;,&quot;b&quot;)
y&lt;-c(&quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;b&quot;)
z&lt;-c(&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;a&quot;)

tot &lt;- unique(c(x,y,z))
# gives you a list of all unique values across all your vectors

df &lt;- data.frame(
  x = rep(NA, times = length(tot)),
  y = NA,
  z = NA
)
# prepare a data frame with all NAs

df$x[tot %in% x] &lt;- tot[tot %in% x]
df$y[tot %in% y] &lt;- tot[tot %in% y]
df$z[tot %in% z] &lt;- tot[tot %in% z]
# fills in the NAs with the matching value if present in the &#39;parent&#39; vector.

Gives:

&gt; df
     x    y    z
1    1    1 &lt;NA&gt;
2    2    2 &lt;NA&gt;
3    a &lt;NA&gt;    a
4    b    b &lt;NA&gt;
5 &lt;NA&gt;    3    3
6 &lt;NA&gt;    4    4
7 &lt;NA&gt;    5    5
8 &lt;NA&gt;    6    6

答案2

得分: 2

以下是翻译好的部分:

l &lt;- list(x = x, y = y, z = z)

dat &lt;- data.frame(
    unique_vals = sort(unique(unlist(l)))
)

dat[names(l)] &lt;- lapply(l, \(x) {
    x[match(dat$unique_vals, x)]
})

#   unique_vals    x    y    z
# 1           1    1    1 &lt;NA&gt;
# 2           2    2    2 &lt;NA&gt;
# 3           3 &lt;NA&gt;    3    3
# 4           4 &lt;NA&gt;    4    4
# 5           5 &lt;NA&gt;    5    5
# 6           6 &lt;NA&gt;    6    6
# 7           a    a &lt;NA&gt;    a
# 8           b    b    b &lt;NA&gt;

我保留了unique_vals列以便清楚地了解操作,但你可能想要将其删除。

英文:

You could try this. It is similar to the answer by Paul Stafford Allen in that it starts with the unique values. I've put the vectors in a list to allow for easy iteration, so it is straightforward to extend to more columns.

l &lt;- list(x = x, y = y, z = z)

dat &lt;- data.frame(
    unique_vals = sort(unique(unlist(l)))
)

dat[names(l)] &lt;- lapply(l, \(x) {
    x[match(dat$unique_vals, x)]
})

#   unique_vals    x    y    z
# 1           1    1    1 &lt;NA&gt;
# 2           2    2    2 &lt;NA&gt;
# 3           3 &lt;NA&gt;    3    3
# 4           4 &lt;NA&gt;    4    4
# 5           5 &lt;NA&gt;    5    5
# 6           6 &lt;NA&gt;    6    6
# 7           a    a &lt;NA&gt;    a
# 8           b    b    b &lt;NA&gt;

I kept the unique_vals column so it's clear what's going on but you may want to remove it.

答案3

得分: 1

你可以在Reduce中使用merge,并按新集合的行名称进行匹配。

l <- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
setNames(
  Reduce(\(a, b) {. <- merge(a, b, by=0, all=TRUE)
    `row.names<-`(.[-1], .[,1])}, l), names(l))
#     x    y    z
#1    1    1 <NA>
#2    2    2 <NA>
#3 <NA>    3    3
#4 <NA>    4    4
#5 <NA>    5    5
#6 <NA>    6    6
#a    a <NA>    a
#b    b    b <NA>

这也适用于向量中一个值出现多次的情况。

x<-c("1","1","2","a","b")
y<-c("1","2","3","4","5","6","b")
z<-c("3","4","5","6","a")

l <- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
setNames(
  Reduce(\(a, b) {. <- merge(a, b, by=0, all=TRUE)
    `row.names<-`(.[-1], .[,1])}, l), names(l))
#       x    y    z
#1      1    1 <NA>
#1.1    1 <NA> <NA>
#2      2    2 <NA>
#3   <NA>    3    3
#4   <NA>    4    4
#5   <NA>    5    5
#6   <NA>    6    6
#a      a <NA>    a
#b      b    b <NA>

或者使用match

x <- c("1","1","2","a","b")
y <- c("1","2","3","4","5","6","b")
z <- c("3","4","5","6","a")

l <- list(x=x, y=y, z=z)
u <- lapply(l, make.unique)
k <- unique(unlist(u))
mapply(\(l, u) l[match(k, u)], l, u)
#     x   y   z  
# [1,] "1" "1" NA 
# [2,] "1" NA  NA 
# [3,] "2" "2" NA 
# [4,] "a" NA  "a"
# [5,] "b" "b" NA 
# [6,] NA  "3" "3"
# [7,] NA  "4" "4"
# [8,] NA  "5" "5"
# [9,] NA  "6" "6"
英文:

You can use merge in Reduce and match by the new set row.names.

l &lt;- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
setNames(
  Reduce(\(a, b) {. &lt;- merge(a, b, by=0, all=TRUE)
    `row.names&lt;-`(.[-1], .[,1])}, l), names(l))
#     x    y    z
#1    1    1 &lt;NA&gt;
#2    2    2 &lt;NA&gt;
#3 &lt;NA&gt;    3    3
#4 &lt;NA&gt;    4    4
#5 &lt;NA&gt;    5    5
#6 &lt;NA&gt;    6    6
#a    a &lt;NA&gt;    a
#b    b    b &lt;NA&gt;

This will also work in case a value is more than one time present in a vector.

x&lt;-c(&quot;1&quot;,&quot;1&quot;,&quot;2&quot;,&quot;a&quot;,&quot;b&quot;)
y&lt;-c(&quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;b&quot;)
z&lt;-c(&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;a&quot;)

l &lt;- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
setNames(
  Reduce(\(a, b) {. &lt;- merge(a, b, by=0, all=TRUE)
    `row.names&lt;-`(.[-1], .[,1])}, l), names(l))
#       x    y    z
#1      1    1 &lt;NA&gt;
#1.1    1 &lt;NA&gt; &lt;NA&gt;
#2      2    2 &lt;NA&gt;
#3   &lt;NA&gt;    3    3
#4   &lt;NA&gt;    4    4
#5   &lt;NA&gt;    5    5
#6   &lt;NA&gt;    6    6
#a      a &lt;NA&gt;    a
#b      b    b &lt;NA&gt;

Or using match.

x &lt;- c(&quot;1&quot;,&quot;1&quot;,&quot;2&quot;,&quot;a&quot;,&quot;b&quot;)
y &lt;- c(&quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;b&quot;)
z &lt;- c(&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;a&quot;)

l &lt;- list(x=x, y=y, z=z)
u &lt;- lapply(l, make.unique)
k &lt;- unique(unlist(u))
mapply(\(l, u) l[match(k, u)], l, u)
#     x   y   z  
# [1,] &quot;1&quot; &quot;1&quot; NA 
# [2,] &quot;1&quot; NA  NA 
# [3,] &quot;2&quot; &quot;2&quot; NA 
# [4,] &quot;a&quot; NA  &quot;a&quot;
# [5,] &quot;b&quot; &quot;b&quot; NA 
# [6,] NA  &quot;3&quot; &quot;3&quot;
# [7,] NA  &quot;4&quot; &quot;4&quot;
# [8,] NA  &quot;5&quot; &quot;5&quot;
# [9,] NA  &quot;6&quot; &quot;6&quot;

huangapple
  • 本文由 发表于 2023年4月17日 17:02:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033406.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定