将多列文本拆分成不同列的R代码示例:

huangapple go评论73阅读模式
英文:

Splitting multiple columns of text into different columns in R

问题

我尝试了以下代码,只生成了分子和分母列:

test <- df %>%
  mutate(across(contains("Data"),
         ~ paste0(.x, "_Numerator") := str_extract(., "^\\d+"),
         ~ paste0(.x, "_Denominator") := str_extract(., "(?<=\\D)\\d+")))

这个代码有一些小错误,我已经进行了修正。这应该能够生成你想要的结果。

英文:

I have a column that looks like the following dataset:

Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2
18/24        14/14    NA       1               1
4/24         NA       6/14     0               0

df &lt;- structure(list(Initial_Data = c(&quot;18/24&quot;, &quot;4/24&quot;), `3Mo_Data` = c(&quot;14/14&quot;, 
NA), `6Mo_Data` = c(NA, &quot;6/14&quot;), Irrelevant_Col1 = 1:0,     Irrelevant_Col2 = 1:0), class = &quot;data.frame&quot;, row.names = c(NA, -2L))

And I'd like to split it in such a way to identify all the columns of "Data" and then split them into three columns:

  1. One with the fraction (originally a character variable), expressed as a decimal.
  2. Second column with the numerator
  3. Third new column with the denominator

while ignoring the irrelevant columns so as to look like the following:

Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2 Inial_Data_Numerator  Initial_Data_Denominator 3Mo_Data_Numerator 3Mo_Data_Denominator 6Mo_Data_Numerator 6Mo_Data_Denominator 
0.75         1        NA       1               1               18                    24                       14                 14                   NA                 NA
0.17         NA       0.43     0               0               4                     24                       NA                 NA                   6                  14

I tried something like the following just to generate the numerator and denominator columns:

test &lt;- df %&gt;%
  mutate(across(contains(&quot;Data&quot;),
         ~ paste0(.x, &quot;Numerator&quot;) = str_extract(., &quot;^\\d+&quot;),
         ~ paste0(.x, &quot;Denominator&quot;) = str_extract(.,&quot;(?&lt;=\\D)\\d+&quot;))

But gives me errors with the equal sign, perhaps I'm not able to use paste0 in this way?

Thanks for your help in advance!

答案1

得分: 2

一个tidyverse工作流程:

library(dplyr)
library(tidyr)

df %>%
  separate_wider_delim(ends_with("Data"), delim = '/',
                       names_sep = '_', names = c("Num", "Denom")) %>%
  mutate(across(ends_with("Num"), as.numeric, .names = "{sub('_Num', '', .col)}") /
         across(ends_with("Denom"), as.numeric),
         .before = 1)

另一个使用cur_column()across()中的mutate()演示:

df %>%
  separate_wider_delim(ends_with("Data"), delim = '/',
                       names_sep = '_', names = c("Num", "Denom")) %>%
  mutate(across(ends_with("Num"),
                ~ as.numeric(.x) / as.numeric(get(sub("Num", "Denom", cur_column()))),
                .names = "{sub('_Num', '', .col)}"),
         .before = 1)
英文:

A tidyverse workflow:

library(dplyr)
library(tidy)

df %&gt;%
  separate_wider_delim(ends_with(&quot;Data&quot;), delim = &#39;/&#39;,
                       names_sep = &#39;_&#39;, names = c(&quot;Num&quot;, &quot;Denom&quot;)) %&gt;%
  mutate(across(ends_with(&quot;Num&quot;), as.numeric, .names = &quot;{sub(&#39;_Num&#39;, &#39;&#39;, .col)}&quot;) /
         across(ends_with(&quot;Denom&quot;), as.numeric),
         .before = 1)

# # A tibble: 2 &#215; 11
#   Initial_Data `3Mo_Data` `6Mo_Data` Initial_Data_Num Initial_Data_Denom `3Mo_Data_Num` `3Mo_Data_Denom` `6Mo_Data_Num` `6Mo_Data_Denom` Irrelevant_Col1 Irrelevant_Col2
#          &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt; &lt;chr&gt;            &lt;chr&gt;              &lt;chr&gt;          &lt;chr&gt;            &lt;chr&gt;          &lt;chr&gt;                      &lt;int&gt;           &lt;int&gt;
# 1        0.75           1     NA     18               24                 14             14               NA             NA                             1               1
# 2        0.167         NA      0.429 4                24                 NA             NA               6              14                             0               0

Another presentation of mutate() that uses cur_column() within across():

df %&gt;%
  separate_wider_delim(ends_with(&quot;Data&quot;), delim = &#39;/&#39;,
                       names_sep = &#39;_&#39;, names = c(&quot;Num&quot;, &quot;Denom&quot;)) %&gt;%
  mutate(across(ends_with(&quot;Num&quot;),
                ~ as.numeric(.x) / as.numeric(get(sub(&quot;Num&quot;, &quot;Denom&quot;, cur_column()))),
                .names = &quot;{sub(&#39;_Num&#39;, &#39;&#39;, .col)}&quot;),
         .before = 1)

答案2

得分: 1

这是一种方法,使用 separate_wider_delim

library(tidyverse)

df <- separate_wider_delim(df, cols = c("Initial_Data", "3Mo_Data", "6Mo_Data"), delim = "/", names_sep = "_")
colnames(df) <- str_replace_all(colnames(df), c("_1$" = "_Numerator", "_2$" = "_Denominator"))
英文:

Here's one way, using separate_wider_delim:

library(tidyverse)

df &lt;- separate_wider_delim(df, cols= c(&quot;Initial_Data&quot;, &quot;3Mo_Data&quot;, &quot;6Mo_Data&quot;), delim = &quot;/&quot;, names_sep = &quot;_&quot;)
colnames(df) &lt;- str_replace_all(colnames(df), c(&quot;_1$&quot; = &quot;_Numerator&quot;, &quot;_2$&quot; = &quot;_Denominator&quot;))

huangapple
  • 本文由 发表于 2023年7月23日 19:26:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747982.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定