根据另一个数据框填充数据框中的缺失值。

huangapple go评论61阅读模式
英文:

Fill NA values in a data frame according to another data frame

问题

我有两个几乎包含相同样本的数据框。`df1`有很多样本,几乎包含在`df2`中找到的所有样本,除了2或3个样本。

在`df1`中有一列,比如说是性别,其中有`NA`值。这些性别值在`df2`中找到。

我想要根据它们之间共享的样本,在`df1`中为每个样本填充`df2`中的`NA`值。

如何做到这一点?尤其是`df1`比`df2`大得多,并且样本不按相同顺序排列。

所以举个例子,假设这是`df1`:

           样本          性别
    1        Pt8           NA
    2        Pt102         NA
    3        Pt87          NA
    4        Pt1           NA

而这是`df2`:

          subject_id     gender
    1        Pt1          male
    2        Pt102        male
    3        Pt6          female
    4        Pt8          male

因此,我只需根据样本名称在`df1`中填充`df2`中的`NA`值。
英文:

I have two data frames that almost have the same samples. df1 has many samples and it contains almost all the samples that are found in df2, apart from 2 or 3 samples.

In df1 there is a column, lets say it's the gender, that has NA values. Those gender values are found in df2.

I want to fill the NA values for each sample in df1 according to df2, for the shared samples between them.

How can I do that? especially that df1 is much bigger than df2 and the samples are not in the same order.

So for example let's say this is df1:

       samples       gender
1        Pt8           NA
2        Pt102         NA
3        Pt87          NA
4        Pt1           NA

And this is df2:

      subject_id     gender
1        Pt1          male
2        Pt102        male
3        Pt6          female
4        Pt8          male

So I just fill in the NA values that are in df1 according to the sample name.

答案1

得分: 2

我们可能会使用连接

library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]
英文:

We may use a join

library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]

</details>



# 答案2
**得分**: 1

以下是您要翻译的内容:

```R
library(dplyr)

bind_rows(df1, df2 %>%
            rename_with(~colnames(df1))) %>%
  arrange(gender) %>%
  distinct(samples, .keep_all = TRUE) %>%
  semi_join(df1, by="samples") %>%
  mutate(samples = factor(samples, levels = df1$samples)) %>%
  arrange(samples)

  samples gender
4     Pt8   男性
2   Pt102   男性
3    Pt87   <NA>
1     Pt1   男性

(注意:原始代码中的 "" 部分不需要翻译。)

英文:

Update: Please see comments (removed wrong first answer):

library(dplyr)

bind_rows(df1, df2 %&gt;% 
            rename_with(~colnames(df1))) %&gt;% 
  arrange(gender) %&gt;% 
  distinct(samples, .keep_all = TRUE) %&gt;% 
  semi_join(df1, by=&quot;samples&quot;) %&gt;% 
  mutate(samples = factor(samples, levels = df1$samples)) %&gt;%
  arrange(samples)

  samples gender
4     Pt8   male
2   Pt102   male
3    Pt87   &lt;NA&gt;
1     Pt1   male

huangapple
  • 本文由 发表于 2023年4月19日 23:41:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056430.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定