连接两个数据集,将下面的ID作为行复制到另一个数据集中。

huangapple go评论109阅读模式
英文:

Full join two datasets as rows duplicating the ids below on another

问题

我想要连接两个数据集,第二个数据集的ID在下方重复。

df1 <- structure(list(
  subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945"), 
  m_or_c = c("m","m","m","m","m","m"),
  edta_code = c("EDTA45", "EDTA79", "EDTA20", "EDTA66", "EDTA12", "EDTA74"), 
  ipv = c("0", "3", "2", "2", "1", "2")), 
  class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(
  subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945"), 
  m_or_c = c("c","c","c","c","c","c","c"),
  edta_code = c("EDTA49", "EDTA95", "EDTA20", "EDTA68", "EDTA89", "EDTA30"), 
  ipv = c("1", "4", "6", "4", "1", "3")), 
  class = "data.frame", row.names = c(NA, -6L))

我尝试过的:

df3 <- full_join(df1, df2, by = "subject_id")

输出结果:
这不是我需要的。

subject_id m_or_c.x edta_code.x ipv.x m_or_c.y edta_code.y ipv.x
191-0987 m EDTA12 1 c EDTA89 1
191-1245 m EDTA20 2 c EDTA20 6
191-1945 m EDTA45 0 c EDTA49 1
191-3457 m EDTA66 2 c EDTA68 4
191-5467 m EDTA74 2 c EDTA30 3
191-6784 m EDTA79 2 c EDTA95 4

期望的表格:

subject_id m_or_c edta_code ipv
191-0987 m EDTA12 1
191-0987 c EDTA89 1
191-1245 m EDTA20 2
191-1245 c EDTA20 6
191-1945 m EDTA45 0
191-1945 c EDTA49 1
191-3457 m EDTA66 2
191-3457 c EDTA68 4
191-5467 m EDTA74 2
191-5467 c EDTA30 3
191-6784 m EDTA79 2
191-6784 c EDTA95 4

请协助。

英文:

I want to join two datasets, with the second datasets ids duplicated under each other.

df1 &lt;- structure (list(
  subject_id = c(&quot;191-5467&quot;, &quot;191-6784&quot;, &quot;191-3457&quot;, &quot;191-0987&quot;, &quot;191-1245&quot;, &quot;191-1945&quot;), 
  m_or_c = c(&quot;m&quot;,&quot;m&quot;,&quot;m&quot;,&quot;m&quot;,&quot;m&quot;,&quot;m&quot;),
  edta_code = c(&quot;EDTA45&quot;, &quot;EDTA79&quot;, &quot;EDTA20&quot;, &quot;EDTA66&quot;, &quot;EDTA12&quot;, &quot;EDTA74&quot;), 
  ipv = c(&quot;0&quot;, &quot;3&quot;, &quot;2&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;)), 
  class = &quot;data.frame&quot;, row.names = c(NA, -6L))
df2 &lt;- structure (list(
  subject_id = c(&quot;191-5467&quot;, &quot;191-6784&quot;, &quot;191-3457&quot;, &quot;191-0987&quot;, &quot;191-1245&quot;, &quot;191-1945&quot;), 
m_or_c = c(&quot;c&quot;,&quot;c&quot;,&quot;c&quot;,&quot;c&quot;,&quot;c&quot;,&quot;c&quot;,&quot;c&quot;),
  edta_code = c(&quot;EDTA49&quot;, &quot;EDTA95&quot;, &quot;EDTA20&quot;, &quot;EDTA68&quot;, &quot;EDTA89&quot;, &quot;EDTA30&quot;), 
  ipv = c(&quot;1&quot;, &quot;4&quot;, &quot;6&quot;, &quot;4&quot;, &quot;1&quot;, &quot;3&quot;)), 
  class = &quot;data.frame&quot;, row.names = c(NA, -6L))

What I've tried

df3 &lt;- full_join(df1, df2, by = &quot;subject_id&quot;)

Output:
This isn't what I need.

subject_id m_or_c.x edta_code.x ipv.x m_or_c.y edta_code.y ipv.x
191-0987 m EDTA12 1 c EDTA89 1
191-1245 m EDTA20 2 c EDTA20 6
191-1945 m EDTA45 0 c EDTA49 1
191-3457 m EDTA66 2 c EDTA68 4
191-5467 m EDTA74 2 c EDTA30 3
191-6784 m EDTA79 2 c EDTA95 4

Desired table

subject_id m_or_c edta_code ipv
191-0987 m EDTA12 1
191-0987 c EDTA89 1
191-1245 m EDTA20 2
191-1245 c EDTA20 6
191-1945 m EDTA45 0
191-1945 c EDTA49 1
191-3457 m EDTA66 2
191-3457 c EDTA68 4
191-5467 m EDTA74 2
191-5467 c EDTA30 3
191-6784 m EDTA79 2
191-6784 c EDTA95 4

Please assist

答案1

得分: 1

不需要使用连接操作,我认为你只需将这些数据框绑定在一起,然后按照你的喜好设置顺序。

基本 R

final <- rbind(df1, df2)
final[with(final, order(subject_id, m_or_c)), ]
#    subject_id m_or_c edta_code ipv
# 10   191-0987      c    EDTA68   4
# 4    191-0987      m    EDTA66   2
# 11   191-1245      c    EDTA89   1
# 5    191-1245      m    EDTA12   1
# 12   191-1945      c    EDTA30   3
# 6    191-1945      m    EDTA74   2
# 9    191-3457      c    EDTA20   6
# 3    191-3457      m    EDTA20   2
# 7    191-5467      c    EDTA49   1
# 1    191-5467      m    EDTA45   0
# 8    191-6784      c    EDTA95   4
# 2    191-6784      m    EDTA79   3

请注意,这只是代码的翻译部分,不包括问题回答。

英文:

No join needed I think. All you need is to bind the data.frames together, and set the order to your liking.

base R

final &lt;- rbind(df1,df2)
final[with(final, order(subject_id, m_or_c)), ]
#    subject_id m_or_c edta_code ipv
# 10   191-0987      c    EDTA68   4
# 4    191-0987      m    EDTA66   2
# 11   191-1245      c    EDTA89   1
# 5    191-1245      m    EDTA12   1
# 12   191-1945      c    EDTA30   3
# 6    191-1945      m    EDTA74   2
# 9    191-3457      c    EDTA20   6
# 3    191-3457      m    EDTA20   2
# 7    191-5467      c    EDTA49   1
# 1    191-5467      m    EDTA45   0
# 8    191-6784      c    EDTA95   4
# 2    191-6784      m    EDTA79   3

答案2

得分: 1

你可以使用 dplyr 包中的 bind_rows 函数,并对数据框进行排序。

df3 = dplyr::bind_rows(df1, df2)
df3 = df3[order(df3$subject_id),]
# subject_id m_or_c edta_code ipv
# 191-0987      m    EDTA66   2
# 191-0987      c    EDTA68   4
# 191-1245      m    EDTA12   1
# 191-1245      c    EDTA89   1
# 191-1945      m    EDTA74   2
# 191-1945      c    EDTA30   3
# 191-3457      m    EDTA20   2
# 191-3457      c    EDTA20   6
# 191-5467      m    EDTA45   0
# 191-5467      c    EDTA49   1
# 191-6784      m    EDTA79   3
# 191-6784      c    EDTA95   4
英文:

You can use bind_rows from dplyr package and order the data frame.

df3 = dplyr::bind_rows(df1,df2)
df3 = df3[order(df3$subject_id),]
# subject_id m_or_c edta_code ipv
# 191-0987      m    EDTA66   2
# 191-0987      c    EDTA68   4
# 191-1245      m    EDTA12   1
# 191-1245      c    EDTA89   1
# 191-1945      m    EDTA74   2
# 191-1945      c    EDTA30   3
# 191-3457      m    EDTA20   2
# 191-3457      c    EDTA20   6
# 191-5467      m    EDTA45   0
# 191-5467      c    EDTA49   1
# 191-6784      m    EDTA79   3
# 191-6784      c    EDTA95   4

huangapple
  • 本文由 发表于 2023年8月10日 17:17:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76874303.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定