英文:
Splitting multiple columns of text into different columns in R
问题
我尝试了以下代码,只生成了分子和分母列:
test <- df %>%
mutate(across(contains("Data"),
~ paste0(.x, "_Numerator") := str_extract(., "^\\d+"),
~ paste0(.x, "_Denominator") := str_extract(., "(?<=\\D)\\d+")))
这个代码有一些小错误,我已经进行了修正。这应该能够生成你想要的结果。
英文:
I have a column that looks like the following dataset:
Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2
18/24 14/14 NA 1 1
4/24 NA 6/14 0 0
df <- structure(list(Initial_Data = c("18/24", "4/24"), `3Mo_Data` = c("14/14",
NA), `6Mo_Data` = c(NA, "6/14"), Irrelevant_Col1 = 1:0, Irrelevant_Col2 = 1:0), class = "data.frame", row.names = c(NA, -2L))
And I'd like to split it in such a way to identify all the columns of "Data" and then split them into three columns:
- One with the fraction (originally a character variable), expressed as a decimal.
- Second column with the numerator
- Third new column with the denominator
while ignoring the irrelevant columns so as to look like the following:
Initial_Data 3Mo_Data 6Mo_Data Irrelevant_Col1 Irrelevant_Col2 Inial_Data_Numerator Initial_Data_Denominator 3Mo_Data_Numerator 3Mo_Data_Denominator 6Mo_Data_Numerator 6Mo_Data_Denominator
0.75 1 NA 1 1 18 24 14 14 NA NA
0.17 NA 0.43 0 0 4 24 NA NA 6 14
I tried something like the following just to generate the numerator and denominator columns:
test <- df %>%
mutate(across(contains("Data"),
~ paste0(.x, "Numerator") = str_extract(., "^\\d+"),
~ paste0(.x, "Denominator") = str_extract(.,"(?<=\\D)\\d+"))
But gives me errors with the equal sign, perhaps I'm not able to use paste0 in this way?
Thanks for your help in advance!
答案1
得分: 2
一个tidyverse
工作流程:
library(dplyr)
library(tidyr)
df %>%
separate_wider_delim(ends_with("Data"), delim = '/',
names_sep = '_', names = c("Num", "Denom")) %>%
mutate(across(ends_with("Num"), as.numeric, .names = "{sub('_Num', '', .col)}") /
across(ends_with("Denom"), as.numeric),
.before = 1)
另一个使用cur_column()
在across()
中的mutate()
演示:
df %>%
separate_wider_delim(ends_with("Data"), delim = '/',
names_sep = '_', names = c("Num", "Denom")) %>%
mutate(across(ends_with("Num"),
~ as.numeric(.x) / as.numeric(get(sub("Num", "Denom", cur_column()))),
.names = "{sub('_Num', '', .col)}"),
.before = 1)
英文:
A tidyverse
workflow:
library(dplyr)
library(tidy)
df %>%
separate_wider_delim(ends_with("Data"), delim = '/',
names_sep = '_', names = c("Num", "Denom")) %>%
mutate(across(ends_with("Num"), as.numeric, .names = "{sub('_Num', '', .col)}") /
across(ends_with("Denom"), as.numeric),
.before = 1)
# # A tibble: 2 × 11
# Initial_Data `3Mo_Data` `6Mo_Data` Initial_Data_Num Initial_Data_Denom `3Mo_Data_Num` `3Mo_Data_Denom` `6Mo_Data_Num` `6Mo_Data_Denom` Irrelevant_Col1 Irrelevant_Col2
# <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <int> <int>
# 1 0.75 1 NA 18 24 14 14 NA NA 1 1
# 2 0.167 NA 0.429 4 24 NA NA 6 14 0 0
Another presentation of mutate()
that uses cur_column()
within across()
:
df %>%
separate_wider_delim(ends_with("Data"), delim = '/',
names_sep = '_', names = c("Num", "Denom")) %>%
mutate(across(ends_with("Num"),
~ as.numeric(.x) / as.numeric(get(sub("Num", "Denom", cur_column()))),
.names = "{sub('_Num', '', .col)}"),
.before = 1)
答案2
得分: 1
这是一种方法,使用 separate_wider_delim
:
library(tidyverse)
df <- separate_wider_delim(df, cols = c("Initial_Data", "3Mo_Data", "6Mo_Data"), delim = "/", names_sep = "_")
colnames(df) <- str_replace_all(colnames(df), c("_1$" = "_Numerator", "_2$" = "_Denominator"))
英文:
Here's one way, using separate_wider_delim
:
library(tidyverse)
df <- separate_wider_delim(df, cols= c("Initial_Data", "3Mo_Data", "6Mo_Data"), delim = "/", names_sep = "_")
colnames(df) <- str_replace_all(colnames(df), c("_1$" = "_Numerator", "_2$" = "_Denominator"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论