匹配并合并多个不同长度的向量到一个数据框中。

huangapple go评论89阅读模式
英文:

r match and combine into a dataframe multiple vectors of different lengths

问题

  1. 结果:
  2. x y z
  3. [1,] "1" "1" NA
  4. [2,] "2" "2" NA
  5. [3,] NA "3" "3"
  6. [4,] NA "4" "4"
  7. [5,] NA "5" "5"
  8. [6,] NA "6" "6"
  9. [7,] "a" NA "a"
  10. [8,] "b" "b" NA
英文:

I am probably missing something very obvious, but I can't seem to find a way to do this. I would like to will merge multiple vectors (or dataframes?) of different lengths into a dataframe by matching values of vector elements with each other and putting them into same row positions, filling rows left empty with NAs. I have tried the solution from qpcR (cbind.na) but it doesn't produce expected outcome.

reproducible example:

  1. x<-c("1","2","a","b")
  2. y<-c("1","2","3","4","5","6","b")
  3. z<-c("3","4","5","6","a")

expected output:

  1. x y z
  2. [1,] 1 1 NA
  3. [2,] 2 2 NA
  4. [3,] NA 3 3
  5. [4,] NA 4 4
  6. [5,] NA 5 5
  7. [6,] NA 6 6
  8. [7,] a NA a
  9. [8,] b b NA

答案1

得分: 2

以下是翻译好的内容:

这是一个笨拙但有效的解决方案。请注意,行顺序与您的要求不匹配:

  1. x <- c("1", "2", "a", "b")
  2. y <- c("1", "2", "3", "4", "5", "6", "b")
  3. z <- c("3", "4", "5", "6", "a")
  4. tot <- unique(c(x, y, z))
  5. # 给你一个包含所有向量中所有唯一值的列表
  6. df <- data.frame(
  7. x = rep(NA, times = length(tot)),
  8. y = NA,
  9. z = NA
  10. )
  11. # 准备一个包含所有NA值的数据框
  12. df$x[tot %in% x] <- tot[tot %in% x]
  13. df$y[tot %in% y] <- tot[tot %in% y]
  14. df$z[tot %in% z] <- tot[tot %in% z]
  15. # 如果在“父”向量中存在匹配值,则填充NA值。

结果:

  1. > df
  2. x y z
  3. 1 1 1 <NA>
  4. 2 2 2 <NA>
  5. 3 a <NA> a
  6. 4 b b <NA>
  7. 5 <NA> 3 3
  8. 6 <NA> 4 4
  9. 7 <NA> 5 5
  10. 8 <NA> 6 6
英文:

Here's a clumsy but working solution. Note the row order does not match your request though:

  1. x&lt;-c(&quot;1&quot;,&quot;2&quot;,&quot;a&quot;,&quot;b&quot;)
  2. y&lt;-c(&quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;b&quot;)
  3. z&lt;-c(&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;a&quot;)
  4. tot &lt;- unique(c(x,y,z))
  5. # gives you a list of all unique values across all your vectors
  6. df &lt;- data.frame(
  7. x = rep(NA, times = length(tot)),
  8. y = NA,
  9. z = NA
  10. )
  11. # prepare a data frame with all NAs
  12. df$x[tot %in% x] &lt;- tot[tot %in% x]
  13. df$y[tot %in% y] &lt;- tot[tot %in% y]
  14. df$z[tot %in% z] &lt;- tot[tot %in% z]
  15. # fills in the NAs with the matching value if present in the &#39;parent&#39; vector.

Gives:

  1. &gt; df
  2. x y z
  3. 1 1 1 &lt;NA&gt;
  4. 2 2 2 &lt;NA&gt;
  5. 3 a &lt;NA&gt; a
  6. 4 b b &lt;NA&gt;
  7. 5 &lt;NA&gt; 3 3
  8. 6 &lt;NA&gt; 4 4
  9. 7 &lt;NA&gt; 5 5
  10. 8 &lt;NA&gt; 6 6

答案2

得分: 2

以下是翻译好的部分:

  1. l &lt;- list(x = x, y = y, z = z)
  2. dat &lt;- data.frame(
  3. unique_vals = sort(unique(unlist(l)))
  4. )
  5. dat[names(l)] &lt;- lapply(l, \(x) {
  6. x[match(dat$unique_vals, x)]
  7. })
  8. # unique_vals x y z
  9. # 1 1 1 1 &lt;NA&gt;
  10. # 2 2 2 2 &lt;NA&gt;
  11. # 3 3 &lt;NA&gt; 3 3
  12. # 4 4 &lt;NA&gt; 4 4
  13. # 5 5 &lt;NA&gt; 5 5
  14. # 6 6 &lt;NA&gt; 6 6
  15. # 7 a a &lt;NA&gt; a
  16. # 8 b b b &lt;NA&gt;

我保留了unique_vals列以便清楚地了解操作,但你可能想要将其删除。

英文:

You could try this. It is similar to the answer by Paul Stafford Allen in that it starts with the unique values. I've put the vectors in a list to allow for easy iteration, so it is straightforward to extend to more columns.

  1. l &lt;- list(x = x, y = y, z = z)
  2. dat &lt;- data.frame(
  3. unique_vals = sort(unique(unlist(l)))
  4. )
  5. dat[names(l)] &lt;- lapply(l, \(x) {
  6. x[match(dat$unique_vals, x)]
  7. })
  8. # unique_vals x y z
  9. # 1 1 1 1 &lt;NA&gt;
  10. # 2 2 2 2 &lt;NA&gt;
  11. # 3 3 &lt;NA&gt; 3 3
  12. # 4 4 &lt;NA&gt; 4 4
  13. # 5 5 &lt;NA&gt; 5 5
  14. # 6 6 &lt;NA&gt; 6 6
  15. # 7 a a &lt;NA&gt; a
  16. # 8 b b b &lt;NA&gt;

I kept the unique_vals column so it's clear what's going on but you may want to remove it.

答案3

得分: 1

你可以在Reduce中使用merge,并按新集合的行名称进行匹配。

  1. l <- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
  2. setNames(
  3. Reduce(\(a, b) {. <- merge(a, b, by=0, all=TRUE)
  4. `row.names<-`(.[-1], .[,1])}, l), names(l))
  5. # x y z
  6. #1 1 1 <NA>
  7. #2 2 2 <NA>
  8. #3 <NA> 3 3
  9. #4 <NA> 4 4
  10. #5 <NA> 5 5
  11. #6 <NA> 6 6
  12. #a a <NA> a
  13. #b b b <NA>

这也适用于向量中一个值出现多次的情况。

  1. x<-c("1","1","2","a","b")
  2. y<-c("1","2","3","4","5","6","b")
  3. z<-c("3","4","5","6","a")
  4. l <- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
  5. setNames(
  6. Reduce(\(a, b) {. <- merge(a, b, by=0, all=TRUE)
  7. `row.names<-`(.[-1], .[,1])}, l), names(l))
  8. # x y z
  9. #1 1 1 <NA>
  10. #1.1 1 <NA> <NA>
  11. #2 2 2 <NA>
  12. #3 <NA> 3 3
  13. #4 <NA> 4 4
  14. #5 <NA> 5 5
  15. #6 <NA> 6 6
  16. #a a <NA> a
  17. #b b b <NA>

或者使用match

  1. x <- c("1","1","2","a","b")
  2. y <- c("1","2","3","4","5","6","b")
  3. z <- c("3","4","5","6","a")
  4. l <- list(x=x, y=y, z=z)
  5. u <- lapply(l, make.unique)
  6. k <- unique(unlist(u))
  7. mapply(\(l, u) l[match(k, u)], l, u)
  8. # x y z
  9. # [1,] "1" "1" NA
  10. # [2,] "1" NA NA
  11. # [3,] "2" "2" NA
  12. # [4,] "a" NA "a"
  13. # [5,] "b" "b" NA
  14. # [6,] NA "3" "3"
  15. # [7,] NA "4" "4"
  16. # [8,] NA "5" "5"
  17. # [9,] NA "6" "6"
英文:

You can use merge in Reduce and match by the new set row.names.

  1. l &lt;- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
  2. setNames(
  3. Reduce(\(a, b) {. &lt;- merge(a, b, by=0, all=TRUE)
  4. `row.names&lt;-`(.[-1], .[,1])}, l), names(l))
  5. # x y z
  6. #1 1 1 &lt;NA&gt;
  7. #2 2 2 &lt;NA&gt;
  8. #3 &lt;NA&gt; 3 3
  9. #4 &lt;NA&gt; 4 4
  10. #5 &lt;NA&gt; 5 5
  11. #6 &lt;NA&gt; 6 6
  12. #a a &lt;NA&gt; a
  13. #b b b &lt;NA&gt;

This will also work in case a value is more than one time present in a vector.

  1. x&lt;-c(&quot;1&quot;,&quot;1&quot;,&quot;2&quot;,&quot;a&quot;,&quot;b&quot;)
  2. y&lt;-c(&quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;b&quot;)
  3. z&lt;-c(&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;a&quot;)
  4. l &lt;- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
  5. setNames(
  6. Reduce(\(a, b) {. &lt;- merge(a, b, by=0, all=TRUE)
  7. `row.names&lt;-`(.[-1], .[,1])}, l), names(l))
  8. # x y z
  9. #1 1 1 &lt;NA&gt;
  10. #1.1 1 &lt;NA&gt; &lt;NA&gt;
  11. #2 2 2 &lt;NA&gt;
  12. #3 &lt;NA&gt; 3 3
  13. #4 &lt;NA&gt; 4 4
  14. #5 &lt;NA&gt; 5 5
  15. #6 &lt;NA&gt; 6 6
  16. #a a &lt;NA&gt; a
  17. #b b b &lt;NA&gt;

Or using match.

  1. x &lt;- c(&quot;1&quot;,&quot;1&quot;,&quot;2&quot;,&quot;a&quot;,&quot;b&quot;)
  2. y &lt;- c(&quot;1&quot;,&quot;2&quot;,&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;b&quot;)
  3. z &lt;- c(&quot;3&quot;,&quot;4&quot;,&quot;5&quot;,&quot;6&quot;,&quot;a&quot;)
  4. l &lt;- list(x=x, y=y, z=z)
  5. u &lt;- lapply(l, make.unique)
  6. k &lt;- unique(unlist(u))
  7. mapply(\(l, u) l[match(k, u)], l, u)
  8. # x y z
  9. # [1,] &quot;1&quot; &quot;1&quot; NA
  10. # [2,] &quot;1&quot; NA NA
  11. # [3,] &quot;2&quot; &quot;2&quot; NA
  12. # [4,] &quot;a&quot; NA &quot;a&quot;
  13. # [5,] &quot;b&quot; &quot;b&quot; NA
  14. # [6,] NA &quot;3&quot; &quot;3&quot;
  15. # [7,] NA &quot;4&quot; &quot;4&quot;
  16. # [8,] NA &quot;5&quot; &quot;5&quot;
  17. # [9,] NA &quot;6&quot; &quot;6&quot;

huangapple
  • 本文由 发表于 2023年4月17日 17:02:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033406.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定