2023年1月9日 01:11:10go评论96阅读模式

英文:

Place strings of column in another column in R

问题

我有一个数据框的列表。以下是一个数据框的示例：

现在我想取第一列的标题 - 在这种情况下是"Basics Chest"，然后将它放在后面的列字符串之后，就像这样：

正如您所看到的，NA不应该受到影响（必须保留它们，因此在之前的步骤中不要筛选掉它们）。

这应该适用于我的整个数据框列表，其中包含不同数量的列，因为我考虑将其包含在一个for循环中。有没有优雅的解决方案？

顺祝安康

英文:

I have a list of dfs. Here is an example of one df:

   `Basics Chest` Anatomie                                Atlas                   
   &lt;lgl&gt;          &lt;chr&gt;                                   &lt;chr&gt;                   
 1 NA             NA                                      Xray                    
 2 NA             NA                                      CT                      
 3 NA             NA                                      PET-CT                  
 4 NA             CT Protokolle Chest Standard            NA

Now I want to take the header of the first column - in this case "Basics Chest" and put it after the strings of the following columns like this:

   `Basics Chest` Anatomie                                    Atlas                   
   &lt;lgl&gt;          &lt;chr&gt;                                       &lt;chr&gt;                   
 1 NA             NA                                          Xray - Basics Chest                   
 2 NA             NA                                          CT - Basics Chest                      
 3 NA             NA                                          PET-CT - Basics Chest                 
 4 NA             CT Protokolle Chest Standard - Basics Chest NA

As you can see, NA shouldn't be touched by this (have to keep them, so no filtering out of them in a prior step).

This should work for the whole list of my df with variable numbers of columns, as I am thinking about including this into a for loop. Any elegant solutions?

Kind regards

答案1

得分: 1

如果我正确理解您要做的事情，我认为您正在寻找purrr库，它是tidyverse的一部分，具体是map()函数系列。这是使用R时了解的最好的工具之一；它可以大大简化代码，并且一旦您习惯了它，就会变得非常合理。然而，理解它可能需要一些时间。它要求您对列表和函数都有相当好的理解。然而，使用purrr的好处是显著的。

map函数会遍历列表或向量，并对每个元素应用一个函数。我认为在《R数据科学》中有一个完整的章节介绍它们，这本书是免费的，强烈推荐阅读。

这里需要注意的一个重要事项（您将在下面的第二步中看到）是，数据框本质上是相同长度的向量列表。

在下面的解决方案中：

首先生成虚拟数据（数据框的列表）。
编写一个函数，获取第一列的名称，然后将该文本添加到数据框中的每一列。
将在第二步中创建的函数应用于整个数据框列表。

如果您有任何问题或我理解错了什么，请告诉我。

#步骤1：创建虚拟数据
df.list <- list(
  "first" = tibble(
    `name 1` = NA,
    a = c(letters[1:5], NA),
    b = c(LETTERS[1:4], NA, "HI!!")
  ),
  "second" = tibble(
    `name 2` = NA,
    d = c(letters[1:5], NA),
    e = c(LETTERS[1:4], NA, "HI!!")
  ),
  "third" = tibble(
    `name 3` = NA,
    f = c(letters[1:5], NA),
    g = c(LETTERS[1:4], NA, "HI!!")
  )
)
#步骤2：创建将应用于每个数据框的函数
add_first_col_name <- function(df) {
  first.name <- names(df)[1]
  #注意：下面的代码将文本附加到每一列。这将把任何非文本列转换为文本。根据您的示例，我认为这是可以接受的，但如果不是，请告诉我 - 还有额外的步骤可以解决这个问题。
  df %>%
    map_df(~str_c(.x, " - ", first.name))
}
#步骤3：使用map()将函数应用于列表中的每个数据框
map(df.list, add_first_col_name)

英文:

If I understand what you're trying to do correctly, I think you're looking for the purrr library, which is part of the tidyverse, specifically the map() family of functions. This is one of the best tools to know if you're using R; it cleans up code tremendously and makes a lot of sense once you get used to it. It does, however, take a while to wrap your head around. It requires that you understand both lists and functions fairly well. However, the rewards to using purrr are substantial.

The map functions go through lists or vectors and apply a function to each element. I think there's a whole chapter on them in R for Data Science, which is free and highly recommended.

An important thing to be aware of here (you'll see this in step two below) is that a dataframe is essentially a list of vectors of the same length.

In the solution below:

I First generate dummy data (a list of data frames).
Write a function that grabs the name of the first column and then adds that text to every column in the dataframe.
Applies the function created in step two to the whole list of data frames.

Let me know if you have any questions or if I misunderstood anything.

#STEP 1: Create dummy data
df.list &lt;- list (
  &quot;first&quot; = tibble(
    `name 1` = NA,
    a = c(letters[1:5], NA),
    b = c(LETTERS[1:4], NA, &quot;HI!!&quot;)
  ),
  &quot;second&quot; = tibble(
    `name 2` = NA,
    d = c(letters[1:5], NA),
    e = c(LETTERS[1:4], NA, &quot;HI!!&quot;)
  ),
  &quot;third&quot; = tibble(
    `name 3` = NA,
    f = c(letters[1:5], NA),
    g = c(LETTERS[1:4], NA, &quot;HI!!&quot;)
  )
)
#STEP 2: Create function that will be applied to each data frame
add_first_col_name &lt;- function (df) {
  
  first.name &lt;- names(df)[1]
  
  #Note: the code below attaches the text to every column. This will turn any
  #non-text columns into text. Based on your example, I think this is okay
  #but let me know if not - there are extra steps that could solve this.
  
  df %&gt;%
    map_df(~str_c(.x, &quot; - &quot;, first.name))
}
#STEP 3: Use map() to apply function to each data frame in the list
map(df.list, add_first_col_name)

答案2

得分: 0

我们可以根据 Atlas 中的 NA 使用基于 ifelse 的方法来进行 paste 操作。

df1$Atlas <- with(df1, ifelse(is.na(`Basics Chest`) & !is.na(Atlas), 
paste(Atlas, "- Basics Chest"), Atlas))

对于多列，只需循环处理除了 Atlas 列之外的其他列，执行相同的操作。

df1[-1] <- lapply(df1[-1], \(x) ifelse(!is.na(x) & 
     is.na(df1[["Basics Chest"]]), paste(x, "- Basics Chest"), x))

或者使用 dplyr：

library(dplyr)
library(stringr)
df1 <- df1 %>%
   mutate(across(-`Basics Chest`, 
   ~ case_when(!is.na(.x) & is.na(`Basics Chest`)
   ~ str_c(.x, ' - Basics Chest'))))

输出：

df1
Basics Chest                                    Anatomie                 Atlas
1           NA                                        <NA>   Xray - Basics Chest
2           NA                                        <NA>     CT - Basics Chest
3           NA                                        <NA> PET-CT - Basics Chest
4           NA CT Protokolle Chest Standard - Basics Chest                  <NA>

数据

df1 <- structure(list(`Basics Chest` = c(NA, NA, NA, NA), Anatomie = c(NA, 
NA, NA, "CT Protokolle Chest Standard"), Atlas = c("Xray", "CT", 
"PET-CT", NA)), class = "data.frame", row.names = c("1", "2", 
"3", "4"))

英文:

We can use an ifelse based on NA in Atlas to paste

df1$Atlas &lt;- with(df1, ifelse(is.na(`Basics Chest`) &amp; !is.na(Atlas), 
paste(Atlas, &quot;- Basics Chest&quot;), Atlas))

For multiple columns, just loop over the columns other than Atlas and do the same

df1[-1] &lt;- lapply(df1[-1], \(x) ifelse(!is.na(x) &amp; 
     is.na(df1[[&quot;Basics Chest&quot;]]), paste(x, &quot;- Basics Chest&quot;), x))

Or with dplyr

library(dplyr)
library(stringr)
df1 &lt;- df1 %&gt;%
   mutate(across(-`Basics Chest`, 
   ~ case_when(!is.na(.x) &amp; is.na(`Basics Chest`)
   ~ str_c(.x, &#39; - Basics Chest&#39;))))

-output

df1
Basics Chest                                    Anatomie                 Atlas
1           NA                                        &lt;NA&gt;   Xray - Basics Chest
2           NA                                        &lt;NA&gt;     CT - Basics Chest
3           NA                                        &lt;NA&gt; PET-CT - Basics Chest
4           NA CT Protokolle Chest Standard - Basics Chest                  &lt;NA&gt;

data

df1 &lt;- structure(list(`Basics Chest` = c(NA, NA, NA, NA), Anatomie = c(NA, 
NA, NA, &quot;CT Protokolle Chest Standard&quot;), Atlas = c(&quot;Xray&quot;, &quot;CT&quot;, 
&quot;PET-CT&quot;, NA)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, &quot;2&quot;, 
&quot;3&quot;, &quot;4&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将R中的列字符串放入另一列中

问题

答案1

答案2

数据

data

geom_raster基于特定的离散值着色

使用表名向量在R中合并多个表格。

创建包含0度的角度和半径序列。

使用ggplot绘制具有两个Y轴的重叠箱线图和云图。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。