英文:
Fill NA values in a data frame according to another data frame
问题
我有两个几乎包含相同样本的数据框。`df1`有很多样本,几乎包含在`df2`中找到的所有样本,除了2或3个样本。
在`df1`中有一列,比如说是性别,其中有`NA`值。这些性别值在`df2`中找到。
我想要根据它们之间共享的样本,在`df1`中为每个样本填充`df2`中的`NA`值。
如何做到这一点?尤其是`df1`比`df2`大得多,并且样本不按相同顺序排列。
所以举个例子,假设这是`df1`:
           样本          性别
    1        Pt8           NA
    2        Pt102         NA
    3        Pt87          NA
    4        Pt1           NA
而这是`df2`:
          subject_id     gender
    1        Pt1          male
    2        Pt102        male
    3        Pt6          female
    4        Pt8          male
因此,我只需根据样本名称在`df1`中填充`df2`中的`NA`值。
英文:
I have two data frames that almost have the same samples. df1 has many samples and it contains almost all the samples that are found in df2, apart from 2 or 3 samples.
In df1 there is a column, lets say it's the gender, that has NA values. Those gender values are found in df2.
I want to fill the NA values for each sample in df1 according to df2, for the shared samples between them.
How can I do that? especially that df1 is much bigger than df2 and the samples are not in the same order.
So for example let's say this is df1:
       samples       gender
1        Pt8           NA
2        Pt102         NA
3        Pt87          NA
4        Pt1           NA
And this is df2:
      subject_id     gender
1        Pt1          male
2        Pt102        male
3        Pt6          female
4        Pt8          male
So I just fill in the NA values that are in df1 according to the sample name.
答案1
得分: 2
我们可能会使用连接
library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]
英文:
We may use a join
library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]
</details>
# 答案2
**得分**: 1
以下是您要翻译的内容:
```R
library(dplyr)
bind_rows(df1, df2 %>%
            rename_with(~colnames(df1))) %>%
  arrange(gender) %>%
  distinct(samples, .keep_all = TRUE) %>%
  semi_join(df1, by="samples") %>%
  mutate(samples = factor(samples, levels = df1$samples)) %>%
  arrange(samples)
  samples gender
4     Pt8   男性
2   Pt102   男性
3    Pt87   <NA>
1     Pt1   男性
(注意:原始代码中的 "
英文:
Update: Please see comments (removed wrong first answer):
library(dplyr)
bind_rows(df1, df2 %>% 
            rename_with(~colnames(df1))) %>% 
  arrange(gender) %>% 
  distinct(samples, .keep_all = TRUE) %>% 
  semi_join(df1, by="samples") %>% 
  mutate(samples = factor(samples, levels = df1$samples)) %>%
  arrange(samples)
  samples gender
4     Pt8   male
2   Pt102   male
3    Pt87   <NA>
1     Pt1   male
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论