问题

我有两个几乎包含相同样本的数据框。`df1`有很多样本，几乎包含在`df2`中找到的所有样本，除了2或3个样本。

在`df1`中有一列，比如说是性别，其中有`NA`值。这些性别值在`df2`中找到。

我想要根据它们之间共享的样本，在`df1`中为每个样本填充`df2`中的`NA`值。

如何做到这一点？尤其是`df1`比`df2`大得多，并且样本不按相同顺序排列。

所以举个例子，假设这是`df1`：

           样本          性别
    1        Pt8           NA
    2        Pt102         NA
    3        Pt87          NA
    4        Pt1           NA

而这是`df2`：

          subject_id     gender
    1        Pt1          male
    2        Pt102        male
    3        Pt6          female
    4        Pt8          male

因此，我只需根据样本名称在`df1`中填充`df2`中的`NA`值。

英文:

I have two data frames that almost have the same samples. df1 has many samples and it contains almost all the samples that are found in df2, apart from 2 or 3 samples.

In df1 there is a column, lets say it's the gender, that has NA values. Those gender values are found in df2.

I want to fill the NA values for each sample in df1 according to df2, for the shared samples between them.

How can I do that? especially that df1 is much bigger than df2 and the samples are not in the same order.

So for example let's say this is df1:

       samples       gender
1        Pt8           NA
2        Pt102         NA
3        Pt87          NA
4        Pt1           NA

And this is df2:

      subject_id     gender
1        Pt1          male
2        Pt102        male
3        Pt6          female
4        Pt8          male

So I just fill in the NA values that are in df1 according to the sample name.

答案1

得分: 2

我们可能会使用连接

library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]

英文:

We may use a join

library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]

</details>



# 答案2
**得分**: 1

以下是您要翻译的内容：

```R
library(dplyr)

bind_rows(df1, df2 %>%
            rename_with(~colnames(df1))) %>%
  arrange(gender) %>%
  distinct(samples, .keep_all = TRUE) %>%
  semi_join(df1, by="samples") %>%
  mutate(samples = factor(samples, levels = df1$samples)) %>%
  arrange(samples)

  samples gender
4     Pt8   男性
2   Pt102   男性
3    Pt87   <NA>
1     Pt1   男性

（注意：原始代码中的 "" 部分不需要翻译。）

英文:

Update: Please see comments (removed wrong first answer):

library(dplyr)

bind_rows(df1, df2 %&gt;% 
            rename_with(~colnames(df1))) %&gt;% 
  arrange(gender) %&gt;% 
  distinct(samples, .keep_all = TRUE) %&gt;% 
  semi_join(df1, by=&quot;samples&quot;) %&gt;% 
  mutate(samples = factor(samples, levels = df1$samples)) %&gt;%
  arrange(samples)

  samples gender
4     Pt8   male
2   Pt102   male
3    Pt87   &lt;NA&gt;
1     Pt1   male

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另一个数据框填充数据框中的缺失值。

问题

答案1

我可以重新编码一组列中的分数，基于与相关名称的另一组列上的分数吗？

我怎样让glht函数打印使用的自由度？

“Incomplete png with ggarrange” 只返回翻译好的部分。

Insert pandas data frame into Postgres

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论