英文:
Full join two datasets as rows duplicating the ids below on another
问题
我想要连接两个数据集,第二个数据集的ID在下方重复。
df1 <- structure(list(
subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945"),
m_or_c = c("m","m","m","m","m","m"),
edta_code = c("EDTA45", "EDTA79", "EDTA20", "EDTA66", "EDTA12", "EDTA74"),
ipv = c("0", "3", "2", "2", "1", "2")),
class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(
subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945"),
m_or_c = c("c","c","c","c","c","c","c"),
edta_code = c("EDTA49", "EDTA95", "EDTA20", "EDTA68", "EDTA89", "EDTA30"),
ipv = c("1", "4", "6", "4", "1", "3")),
class = "data.frame", row.names = c(NA, -6L))
我尝试过的:
df3 <- full_join(df1, df2, by = "subject_id")
输出结果:
这不是我需要的。
subject_id | m_or_c.x | edta_code.x | ipv.x | m_or_c.y | edta_code.y | ipv.x |
---|---|---|---|---|---|---|
191-0987 | m | EDTA12 | 1 | c | EDTA89 | 1 |
191-1245 | m | EDTA20 | 2 | c | EDTA20 | 6 |
191-1945 | m | EDTA45 | 0 | c | EDTA49 | 1 |
191-3457 | m | EDTA66 | 2 | c | EDTA68 | 4 |
191-5467 | m | EDTA74 | 2 | c | EDTA30 | 3 |
191-6784 | m | EDTA79 | 2 | c | EDTA95 | 4 |
期望的表格:
subject_id | m_or_c | edta_code | ipv |
---|---|---|---|
191-0987 | m | EDTA12 | 1 |
191-0987 | c | EDTA89 | 1 |
191-1245 | m | EDTA20 | 2 |
191-1245 | c | EDTA20 | 6 |
191-1945 | m | EDTA45 | 0 |
191-1945 | c | EDTA49 | 1 |
191-3457 | m | EDTA66 | 2 |
191-3457 | c | EDTA68 | 4 |
191-5467 | m | EDTA74 | 2 |
191-5467 | c | EDTA30 | 3 |
191-6784 | m | EDTA79 | 2 |
191-6784 | c | EDTA95 | 4 |
请协助。
英文:
I want to join two datasets, with the second datasets ids duplicated under each other.
df1 <- structure (list(
subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945"),
m_or_c = c("m","m","m","m","m","m"),
edta_code = c("EDTA45", "EDTA79", "EDTA20", "EDTA66", "EDTA12", "EDTA74"),
ipv = c("0", "3", "2", "2", "1", "2")),
class = "data.frame", row.names = c(NA, -6L))
df2 <- structure (list(
subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-1945"),
m_or_c = c("c","c","c","c","c","c","c"),
edta_code = c("EDTA49", "EDTA95", "EDTA20", "EDTA68", "EDTA89", "EDTA30"),
ipv = c("1", "4", "6", "4", "1", "3")),
class = "data.frame", row.names = c(NA, -6L))
What I've tried
df3 <- full_join(df1, df2, by = "subject_id")
Output:
This isn't what I need.
subject_id | m_or_c.x | edta_code.x | ipv.x | m_or_c.y | edta_code.y | ipv.x |
---|---|---|---|---|---|---|
191-0987 | m | EDTA12 | 1 | c | EDTA89 | 1 |
191-1245 | m | EDTA20 | 2 | c | EDTA20 | 6 |
191-1945 | m | EDTA45 | 0 | c | EDTA49 | 1 |
191-3457 | m | EDTA66 | 2 | c | EDTA68 | 4 |
191-5467 | m | EDTA74 | 2 | c | EDTA30 | 3 |
191-6784 | m | EDTA79 | 2 | c | EDTA95 | 4 |
Desired table
subject_id | m_or_c | edta_code | ipv |
---|---|---|---|
191-0987 | m | EDTA12 | 1 |
191-0987 | c | EDTA89 | 1 |
191-1245 | m | EDTA20 | 2 |
191-1245 | c | EDTA20 | 6 |
191-1945 | m | EDTA45 | 0 |
191-1945 | c | EDTA49 | 1 |
191-3457 | m | EDTA66 | 2 |
191-3457 | c | EDTA68 | 4 |
191-5467 | m | EDTA74 | 2 |
191-5467 | c | EDTA30 | 3 |
191-6784 | m | EDTA79 | 2 |
191-6784 | c | EDTA95 | 4 |
Please assist
答案1
得分: 1
不需要使用连接操作,我认为你只需将这些数据框绑定在一起,然后按照你的喜好设置顺序。
基本 R
final <- rbind(df1, df2)
final[with(final, order(subject_id, m_or_c)), ]
# subject_id m_or_c edta_code ipv
# 10 191-0987 c EDTA68 4
# 4 191-0987 m EDTA66 2
# 11 191-1245 c EDTA89 1
# 5 191-1245 m EDTA12 1
# 12 191-1945 c EDTA30 3
# 6 191-1945 m EDTA74 2
# 9 191-3457 c EDTA20 6
# 3 191-3457 m EDTA20 2
# 7 191-5467 c EDTA49 1
# 1 191-5467 m EDTA45 0
# 8 191-6784 c EDTA95 4
# 2 191-6784 m EDTA79 3
请注意,这只是代码的翻译部分,不包括问题回答。
英文:
No join needed I think. All you need is to bind the data.frames together, and set the order to your liking.
base R
final <- rbind(df1,df2)
final[with(final, order(subject_id, m_or_c)), ]
# subject_id m_or_c edta_code ipv
# 10 191-0987 c EDTA68 4
# 4 191-0987 m EDTA66 2
# 11 191-1245 c EDTA89 1
# 5 191-1245 m EDTA12 1
# 12 191-1945 c EDTA30 3
# 6 191-1945 m EDTA74 2
# 9 191-3457 c EDTA20 6
# 3 191-3457 m EDTA20 2
# 7 191-5467 c EDTA49 1
# 1 191-5467 m EDTA45 0
# 8 191-6784 c EDTA95 4
# 2 191-6784 m EDTA79 3
答案2
得分: 1
你可以使用 dplyr
包中的 bind_rows
函数,并对数据框进行排序。
df3 = dplyr::bind_rows(df1, df2)
df3 = df3[order(df3$subject_id),]
# subject_id m_or_c edta_code ipv
# 191-0987 m EDTA66 2
# 191-0987 c EDTA68 4
# 191-1245 m EDTA12 1
# 191-1245 c EDTA89 1
# 191-1945 m EDTA74 2
# 191-1945 c EDTA30 3
# 191-3457 m EDTA20 2
# 191-3457 c EDTA20 6
# 191-5467 m EDTA45 0
# 191-5467 c EDTA49 1
# 191-6784 m EDTA79 3
# 191-6784 c EDTA95 4
英文:
You can use bind_rows
from dplyr
package and order the data frame.
df3 = dplyr::bind_rows(df1,df2)
df3 = df3[order(df3$subject_id),]
# subject_id m_or_c edta_code ipv
# 191-0987 m EDTA66 2
# 191-0987 c EDTA68 4
# 191-1245 m EDTA12 1
# 191-1245 c EDTA89 1
# 191-1945 m EDTA74 2
# 191-1945 c EDTA30 3
# 191-3457 m EDTA20 2
# 191-3457 c EDTA20 6
# 191-5467 m EDTA45 0
# 191-5467 c EDTA49 1
# 191-6784 m EDTA79 3
# 191-6784 c EDTA95 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论