将R中的列字符串放入另一列中

huangapple go评论61阅读模式
英文:

Place strings of column in another column in R

问题

我有一个数据框的列表。以下是一个数据框的示例:

现在我想取第一列的标题 - 在这种情况下是"Basics Chest",然后将它放在后面的列字符串之后,就像这样:

正如您所看到的,NA不应该受到影响(必须保留它们,因此在之前的步骤中不要筛选掉它们)。

这应该适用于我的整个数据框列表,其中包含不同数量的列,因为我考虑将其包含在一个for循环中。有没有优雅的解决方案?

顺祝安康

英文:

I have a list of dfs. Here is an example of one df:

   `Basics Chest` Anatomie                                Atlas                   
   <lgl>          <chr>                                   <chr>                   
 1 NA             NA                                      Xray                    
 2 NA             NA                                      CT                      
 3 NA             NA                                      PET-CT                  
 4 NA             CT Protokolle Chest Standard            NA 

Now I want to take the header of the first column - in this case "Basics Chest" and put it after the strings of the following columns like this:

   `Basics Chest` Anatomie                                    Atlas                   
   <lgl>          <chr>                                       <chr>                   
 1 NA             NA                                          Xray - Basics Chest                   
 2 NA             NA                                          CT - Basics Chest                      
 3 NA             NA                                          PET-CT - Basics Chest                 
 4 NA             CT Protokolle Chest Standard - Basics Chest NA 

As you can see, NA shouldn't be touched by this (have to keep them, so no filtering out of them in a prior step).

This should work for the whole list of my df with variable numbers of columns, as I am thinking about including this into a for loop. Any elegant solutions?

Kind regards

答案1

得分: 1

如果我正确理解您要做的事情,我认为您正在寻找purrr库,它是tidyverse的一部分,具体是map()函数系列。这是使用R时了解的最好的工具之一;它可以大大简化代码,并且一旦您习惯了它,就会变得非常合理。然而,理解它可能需要一些时间。它要求您对列表和函数都有相当好的理解。然而,使用purrr的好处是显著的。

map函数会遍历列表或向量,并对每个元素应用一个函数。我认为在《R数据科学》中有一个完整的章节介绍它们,这本书是免费的,强烈推荐阅读。

这里需要注意的一个重要事项(您将在下面的第二步中看到)是,数据框本质上是相同长度的向量列表。

在下面的解决方案中:

  1. 首先生成虚拟数据(数据框的列表)。
  2. 编写一个函数,获取第一列的名称,然后将该文本添加到数据框中的每一列。
  3. 将在第二步中创建的函数应用于整个数据框列表。

如果您有任何问题或我理解错了什么,请告诉我。

#步骤1:创建虚拟数据
df.list <- list(
  "first" = tibble(
    `name 1` = NA,
    a = c(letters[1:5], NA),
    b = c(LETTERS[1:4], NA, "HI!!")
  ),
  "second" = tibble(
    `name 2` = NA,
    d = c(letters[1:5], NA),
    e = c(LETTERS[1:4], NA, "HI!!")
  ),
  "third" = tibble(
    `name 3` = NA,
    f = c(letters[1:5], NA),
    g = c(LETTERS[1:4], NA, "HI!!")
  )
)

#步骤2:创建将应用于每个数据框的函数
add_first_col_name <- function(df) {
  first.name <- names(df)[1]
  #注意:下面的代码将文本附加到每一列。这将把任何非文本列转换为文本。根据您的示例,我认为这是可以接受的,但如果不是,请告诉我 - 还有额外的步骤可以解决这个问题。
  df %>%
    map_df(~str_c(.x, " - ", first.name))
}

#步骤3:使用map()将函数应用于列表中的每个数据框
map(df.list, add_first_col_name)
英文:

If I understand what you're trying to do correctly, I think you're looking for the purrr library, which is part of the tidyverse, specifically the map() family of functions. This is one of the best tools to know if you're using R; it cleans up code tremendously and makes a lot of sense once you get used to it. It does, however, take a while to wrap your head around. It requires that you understand both lists and functions fairly well. However, the rewards to using purrr are substantial.

The map functions go through lists or vectors and apply a function to each element. I think there's a whole chapter on them in R for Data Science, which is free and highly recommended.

An important thing to be aware of here (you'll see this in step two below) is that a dataframe is essentially a list of vectors of the same length.

In the solution below:

  1. I First generate dummy data (a list of data frames).
  2. Write a function that grabs the name of the first column and then adds that text to every column in the dataframe.
  3. Applies the function created in step two to the whole list of data frames.

Let me know if you have any questions or if I misunderstood anything.

#STEP 1: Create dummy data
df.list &lt;- list (
  &quot;first&quot; = tibble(
    `name 1` = NA,
    a = c(letters[1:5], NA),
    b = c(LETTERS[1:4], NA, &quot;HI!!&quot;)
  ),
  &quot;second&quot; = tibble(
    `name 2` = NA,
    d = c(letters[1:5], NA),
    e = c(LETTERS[1:4], NA, &quot;HI!!&quot;)
  ),
  &quot;third&quot; = tibble(
    `name 3` = NA,
    f = c(letters[1:5], NA),
    g = c(LETTERS[1:4], NA, &quot;HI!!&quot;)
  )
)

#STEP 2: Create function that will be applied to each data frame
add_first_col_name &lt;- function (df) {

  
  first.name &lt;- names(df)[1]
  
  #Note: the code below attaches the text to every column. This will turn any
  #non-text columns into text. Based on your example, I think this is okay
  #but let me know if not - there are extra steps that could solve this.
  
  df %&gt;%
    map_df(~str_c(.x, &quot; - &quot;, first.name))
}

#STEP 3: Use map() to apply function to each data frame in the list
map(df.list, add_first_col_name)

答案2

得分: 0

我们可以根据 Atlas 中的 NA 使用基于 ifelse 的方法来进行 paste 操作。

df1$Atlas <- with(df1, ifelse(is.na(`Basics Chest`) & !is.na(Atlas), 
paste(Atlas, "- Basics Chest"), Atlas))

对于多列,只需循环处理除了 Atlas 列之外的其他列,执行相同的操作。

df1[-1] <- lapply(df1[-1], \(x) ifelse(!is.na(x) & 
     is.na(df1[["Basics Chest"]]), paste(x, "- Basics Chest"), x))

或者使用 dplyr

library(dplyr)
library(stringr)
df1 <- df1 %>%
   mutate(across(-`Basics Chest`, 
   ~ case_when(!is.na(.x) & is.na(`Basics Chest`)
   ~ str_c(.x, ' - Basics Chest'))))

输出:

df1
Basics Chest                                    Anatomie                 Atlas
1           NA                                        <NA>   Xray - Basics Chest
2           NA                                        <NA>     CT - Basics Chest
3           NA                                        <NA> PET-CT - Basics Chest
4           NA CT Protokolle Chest Standard - Basics Chest                  <NA>

数据

df1 <- structure(list(`Basics Chest` = c(NA, NA, NA, NA), Anatomie = c(NA, 
NA, NA, "CT Protokolle Chest Standard"), Atlas = c("Xray", "CT", 
"PET-CT", NA)), class = "data.frame", row.names = c("1", "2", 
"3", "4"))
英文:

We can use an ifelse based on NA in Atlas to paste

df1$Atlas &lt;- with(df1, ifelse(is.na(`Basics Chest`) &amp; !is.na(Atlas), 
paste(Atlas, &quot;- Basics Chest&quot;), Atlas))

For multiple columns, just loop over the columns other than Atlas and do the same

df1[-1] &lt;- lapply(df1[-1], \(x) ifelse(!is.na(x) &amp; 
     is.na(df1[[&quot;Basics Chest&quot;]]), paste(x, &quot;- Basics Chest&quot;), x))

Or with dplyr

library(dplyr)
library(stringr)
df1 &lt;- df1 %&gt;%
   mutate(across(-`Basics Chest`, 
   ~ case_when(!is.na(.x) &amp; is.na(`Basics Chest`)
   ~ str_c(.x, &#39; - Basics Chest&#39;))))

-output

df1
Basics Chest                                    Anatomie                 Atlas
1           NA                                        &lt;NA&gt;   Xray - Basics Chest
2           NA                                        &lt;NA&gt;     CT - Basics Chest
3           NA                                        &lt;NA&gt; PET-CT - Basics Chest
4           NA CT Protokolle Chest Standard - Basics Chest                  &lt;NA&gt;

data

df1 &lt;- structure(list(`Basics Chest` = c(NA, NA, NA, NA), Anatomie = c(NA, 
NA, NA, &quot;CT Protokolle Chest Standard&quot;), Atlas = c(&quot;Xray&quot;, &quot;CT&quot;, 
&quot;PET-CT&quot;, NA)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, &quot;2&quot;, 
&quot;3&quot;, &quot;4&quot;))

huangapple
  • 本文由 发表于 2023年1月9日 01:11:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049803.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定