将多列文本拆分成不同列的R代码示例:

huangapple go评论96阅读模式
英文:

Splitting multiple columns of text into different columns in R

问题

我尝试了以下代码,只生成了分子和分母列:

  1. test <- df %>%
  2. mutate(across(contains("Data"),
  3. ~ paste0(.x, "_Numerator") := str_extract(., "^\\d+"),
  4. ~ paste0(.x, "_Denominator") := str_extract(., "(?<=\\D)\\d+")))

这个代码有一些小错误,我已经进行了修正。这应该能够生成你想要的结果。

英文:

I have a column that looks like the following dataset:

  1. Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2
  2. 18/24 14/14 NA 1 1
  3. 4/24 NA 6/14 0 0
  4. df &lt;- structure(list(Initial_Data = c(&quot;18/24&quot;, &quot;4/24&quot;), `3Mo_Data` = c(&quot;14/14&quot;,
  5. NA), `6Mo_Data` = c(NA, &quot;6/14&quot;), Irrelevant_Col1 = 1:0, Irrelevant_Col2 = 1:0), class = &quot;data.frame&quot;, row.names = c(NA, -2L))

And I'd like to split it in such a way to identify all the columns of "Data" and then split them into three columns:

  1. One with the fraction (originally a character variable), expressed as a decimal.
  2. Second column with the numerator
  3. Third new column with the denominator

while ignoring the irrelevant columns so as to look like the following:

  1. Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2 Inial_Data_Numerator Initial_Data_Denominator 3Mo_Data_Numerator 3Mo_Data_Denominator 6Mo_Data_Numerator 6Mo_Data_Denominator
  2. 0.75 1 NA 1 1 18 24 14 14 NA NA
  3. 0.17 NA 0.43 0 0 4 24 NA NA 6 14

I tried something like the following just to generate the numerator and denominator columns:

  1. test &lt;- df %&gt;%
  2. mutate(across(contains(&quot;Data&quot;),
  3. ~ paste0(.x, &quot;Numerator&quot;) = str_extract(., &quot;^\\d+&quot;),
  4. ~ paste0(.x, &quot;Denominator&quot;) = str_extract(.,&quot;(?&lt;=\\D)\\d+&quot;))

But gives me errors with the equal sign, perhaps I'm not able to use paste0 in this way?

Thanks for your help in advance!

答案1

得分: 2

一个tidyverse工作流程:

  1. library(dplyr)
  2. library(tidyr)
  3. df %>%
  4. separate_wider_delim(ends_with("Data"), delim = '/',
  5. names_sep = '_', names = c("Num", "Denom")) %>%
  6. mutate(across(ends_with("Num"), as.numeric, .names = "{sub('_Num', '', .col)}") /
  7. across(ends_with("Denom"), as.numeric),
  8. .before = 1)

另一个使用cur_column()across()中的mutate()演示:

  1. df %>%
  2. separate_wider_delim(ends_with("Data"), delim = '/',
  3. names_sep = '_', names = c("Num", "Denom")) %>%
  4. mutate(across(ends_with("Num"),
  5. ~ as.numeric(.x) / as.numeric(get(sub("Num", "Denom", cur_column()))),
  6. .names = "{sub('_Num', '', .col)}"),
  7. .before = 1)
英文:

A tidyverse workflow:

  1. library(dplyr)
  2. library(tidy)
  3. df %&gt;%
  4. separate_wider_delim(ends_with(&quot;Data&quot;), delim = &#39;/&#39;,
  5. names_sep = &#39;_&#39;, names = c(&quot;Num&quot;, &quot;Denom&quot;)) %&gt;%
  6. mutate(across(ends_with(&quot;Num&quot;), as.numeric, .names = &quot;{sub(&#39;_Num&#39;, &#39;&#39;, .col)}&quot;) /
  7. across(ends_with(&quot;Denom&quot;), as.numeric),
  8. .before = 1)
  9. # # A tibble: 2 &#215; 11
  10. # Initial_Data `3Mo_Data` `6Mo_Data` Initial_Data_Num Initial_Data_Denom `3Mo_Data_Num` `3Mo_Data_Denom` `6Mo_Data_Num` `6Mo_Data_Denom` Irrelevant_Col1 Irrelevant_Col2
  11. # &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt;
  12. # 1 0.75 1 NA 18 24 14 14 NA NA 1 1
  13. # 2 0.167 NA 0.429 4 24 NA NA 6 14 0 0

Another presentation of mutate() that uses cur_column() within across():

  1. df %&gt;%
  2. separate_wider_delim(ends_with(&quot;Data&quot;), delim = &#39;/&#39;,
  3. names_sep = &#39;_&#39;, names = c(&quot;Num&quot;, &quot;Denom&quot;)) %&gt;%
  4. mutate(across(ends_with(&quot;Num&quot;),
  5. ~ as.numeric(.x) / as.numeric(get(sub(&quot;Num&quot;, &quot;Denom&quot;, cur_column()))),
  6. .names = &quot;{sub(&#39;_Num&#39;, &#39;&#39;, .col)}&quot;),
  7. .before = 1)

答案2

得分: 1

这是一种方法,使用 separate_wider_delim

  1. library(tidyverse)
  2. df <- separate_wider_delim(df, cols = c("Initial_Data", "3Mo_Data", "6Mo_Data"), delim = "/", names_sep = "_")
  3. colnames(df) <- str_replace_all(colnames(df), c("_1$" = "_Numerator", "_2$" = "_Denominator"))
英文:

Here's one way, using separate_wider_delim:

  1. library(tidyverse)
  2. df &lt;- separate_wider_delim(df, cols= c(&quot;Initial_Data&quot;, &quot;3Mo_Data&quot;, &quot;6Mo_Data&quot;), delim = &quot;/&quot;, names_sep = &quot;_&quot;)
  3. colnames(df) &lt;- str_replace_all(colnames(df), c(&quot;_1$&quot; = &quot;_Numerator&quot;, &quot;_2$&quot; = &quot;_Denominator&quot;))

huangapple
  • 本文由 发表于 2023年7月23日 19:26:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76747982.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定