2023年7月23日 19:26:30go评论96阅读模式

英文:

Splitting multiple columns of text into different columns in R

问题

我尝试了以下代码，只生成了分子和分母列：

test <- df %>%
  mutate(across(contains("Data"),
         ~ paste0(.x, "_Numerator") := str_extract(., "^\\d+"),
         ~ paste0(.x, "_Denominator") := str_extract(., "(?<=\\D)\\d+")))

这个代码有一些小错误，我已经进行了修正。这应该能够生成你想要的结果。

英文:

I have a column that looks like the following dataset:

Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2
18/24        14/14    NA       1               1
4/24         NA       6/14     0               0
df &lt;- structure(list(Initial_Data = c(&quot;18/24&quot;, &quot;4/24&quot;), `3Mo_Data` = c(&quot;14/14&quot;, 
NA), `6Mo_Data` = c(NA, &quot;6/14&quot;), Irrelevant_Col1 = 1:0,     Irrelevant_Col2 = 1:0), class = &quot;data.frame&quot;, row.names = c(NA, -2L))

And I'd like to split it in such a way to identify all the columns of "Data" and then split them into three columns:

One with the fraction (originally a character variable), expressed as a decimal.
Second column with the numerator
Third new column with the denominator

while ignoring the irrelevant columns so as to look like the following:

Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2 Inial_Data_Numerator  Initial_Data_Denominator 3Mo_Data_Numerator 3Mo_Data_Denominator 6Mo_Data_Numerator 6Mo_Data_Denominator 
0.75         1        NA       1               1               18                    24                       14                 14                   NA                 NA
0.17         NA       0.43     0               0               4                     24                       NA                 NA                   6                  14

I tried something like the following just to generate the numerator and denominator columns:

test &lt;- df %&gt;%
  mutate(across(contains(&quot;Data&quot;),
         ~ paste0(.x, &quot;Numerator&quot;) = str_extract(., &quot;^\\d+&quot;),
         ~ paste0(.x, &quot;Denominator&quot;) = str_extract(.,&quot;(?&lt;=\\D)\\d+&quot;))

But gives me errors with the equal sign, perhaps I'm not able to use paste0 in this way?

Thanks for your help in advance!

答案1

得分: 2

一个tidyverse工作流程：

library(dplyr)
library(tidyr)
df %>%
  separate_wider_delim(ends_with("Data"), delim = '/',
                       names_sep = '_', names = c("Num", "Denom")) %>%
  mutate(across(ends_with("Num"), as.numeric, .names = "{sub('_Num', '', .col)}") /
         across(ends_with("Denom"), as.numeric),
         .before = 1)

另一个使用cur_column()在across()中的mutate()演示：

df %>%
  separate_wider_delim(ends_with("Data"), delim = '/',
                       names_sep = '_', names = c("Num", "Denom")) %>%
  mutate(across(ends_with("Num"),
                ~ as.numeric(.x) / as.numeric(get(sub("Num", "Denom", cur_column()))),
                .names = "{sub('_Num', '', .col)}"),
         .before = 1)

英文:

A tidyverse workflow:

library(dplyr)
library(tidy)
df %&gt;%
  separate_wider_delim(ends_with(&quot;Data&quot;), delim = &#39;/&#39;,
                       names_sep = &#39;_&#39;, names = c(&quot;Num&quot;, &quot;Denom&quot;)) %&gt;%
  mutate(across(ends_with(&quot;Num&quot;), as.numeric, .names = &quot;{sub(&#39;_Num&#39;, &#39;&#39;, .col)}&quot;) /
         across(ends_with(&quot;Denom&quot;), as.numeric),
         .before = 1)
# # A tibble: 2 &#215; 11
#   Initial_Data `3Mo_Data` `6Mo_Data` Initial_Data_Num Initial_Data_Denom `3Mo_Data_Num` `3Mo_Data_Denom` `6Mo_Data_Num` `6Mo_Data_Denom` Irrelevant_Col1 Irrelevant_Col2
#          &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt; &lt;chr&gt;            &lt;chr&gt;              &lt;chr&gt;          &lt;chr&gt;            &lt;chr&gt;          &lt;chr&gt;                      &lt;int&gt;           &lt;int&gt;
# 1        0.75           1     NA     18               24                 14             14               NA             NA                             1               1
# 2        0.167         NA      0.429 4                24                 NA             NA               6              14                             0               0

Another presentation of mutate() that uses cur_column() within across():

df %&gt;%
  separate_wider_delim(ends_with(&quot;Data&quot;), delim = &#39;/&#39;,
                       names_sep = &#39;_&#39;, names = c(&quot;Num&quot;, &quot;Denom&quot;)) %&gt;%
  mutate(across(ends_with(&quot;Num&quot;),
                ~ as.numeric(.x) / as.numeric(get(sub(&quot;Num&quot;, &quot;Denom&quot;, cur_column()))),
                .names = &quot;{sub(&#39;_Num&#39;, &#39;&#39;, .col)}&quot;),
         .before = 1)

答案2

得分: 1

这是一种方法，使用 separate_wider_delim：

library(tidyverse)
df <- separate_wider_delim(df, cols = c("Initial_Data", "3Mo_Data", "6Mo_Data"), delim = "/", names_sep = "_")
colnames(df) <- str_replace_all(colnames(df), c("_1$" = "_Numerator", "_2$" = "_Denominator"))

英文:

Here's one way, using separate_wider_delim:

library(tidyverse)
df &lt;- separate_wider_delim(df, cols= c(&quot;Initial_Data&quot;, &quot;3Mo_Data&quot;, &quot;6Mo_Data&quot;), delim = &quot;/&quot;, names_sep = &quot;_&quot;)
colnames(df) &lt;- str_replace_all(colnames(df), c(&quot;_1$&quot; = &quot;_Numerator&quot;, &quot;_2$&quot; = &quot;_Denominator&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将多列文本拆分成不同列的R代码示例：

问题

答案1

答案2

根据数据框列的值拆分数字。

“None of [Index([‘PBT’, ‘Book_Preference’], dtype=’object’)] are in the [index]”

R: 在JSON中识别地理坐标

Assign Group Number to Dataframe – 在两列之间进行匹配时分配组号

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。