英文:
Fill NA values in a data frame according to another data frame
问题
我有两个几乎包含相同样本的数据框。`df1`有很多样本,几乎包含在`df2`中找到的所有样本,除了2或3个样本。
在`df1`中有一列,比如说是性别,其中有`NA`值。这些性别值在`df2`中找到。
我想要根据它们之间共享的样本,在`df1`中为每个样本填充`df2`中的`NA`值。
如何做到这一点?尤其是`df1`比`df2`大得多,并且样本不按相同顺序排列。
所以举个例子,假设这是`df1`:
样本 性别
1 Pt8 NA
2 Pt102 NA
3 Pt87 NA
4 Pt1 NA
而这是`df2`:
subject_id gender
1 Pt1 male
2 Pt102 male
3 Pt6 female
4 Pt8 male
因此,我只需根据样本名称在`df1`中填充`df2`中的`NA`值。
英文:
I have two data frames that almost have the same samples. df1
has many samples and it contains almost all the samples that are found in df2
, apart from 2 or 3 samples.
In df1
there is a column, lets say it's the gender, that has NA
values. Those gender values are found in df2
.
I want to fill the NA values for each sample in df1
according to df2
, for the shared samples between them.
How can I do that? especially that df1
is much bigger than df2
and the samples are not in the same order.
So for example let's say this is df1
:
samples gender
1 Pt8 NA
2 Pt102 NA
3 Pt87 NA
4 Pt1 NA
And this is df2
:
subject_id gender
1 Pt1 male
2 Pt102 male
3 Pt6 female
4 Pt8 male
So I just fill in the NA values that are in df1
according to the sample name.
答案1
得分: 2
我们可能会使用连接
library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender),
on= .(samples = subject_id)]
英文:
We may use a join
library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender),
on= .(samples = subject_id)]
</details>
# 答案2
**得分**: 1
以下是您要翻译的内容:
```R
library(dplyr)
bind_rows(df1, df2 %>%
rename_with(~colnames(df1))) %>%
arrange(gender) %>%
distinct(samples, .keep_all = TRUE) %>%
semi_join(df1, by="samples") %>%
mutate(samples = factor(samples, levels = df1$samples)) %>%
arrange(samples)
samples gender
4 Pt8 男性
2 Pt102 男性
3 Pt87 <NA>
1 Pt1 男性
(注意:原始代码中的 "
英文:
Update: Please see comments (removed wrong first answer):
library(dplyr)
bind_rows(df1, df2 %>%
rename_with(~colnames(df1))) %>%
arrange(gender) %>%
distinct(samples, .keep_all = TRUE) %>%
semi_join(df1, by="samples") %>%
mutate(samples = factor(samples, levels = df1$samples)) %>%
arrange(samples)
samples gender
4 Pt8 male
2 Pt102 male
3 Pt87 <NA>
1 Pt1 male
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论