英文:
Remove specific character from entire dataframe but keep same data types?
问题
让我们模拟一些数据:
library(dplyr)
library(stringi)
set.seed(1)
a <- round(rnorm(20, 5, 5))
b <- stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab <- cbind.data.frame(a, b)
当我们检查第一列的数据类型时,它被编码为 "double":
typeof(ab[,1])
我想要去掉数据框中可能存在的所有逗号,所以我使用了这段代码:
ab1 <- ab %>%
mutate_all(funs(str_replace(., ",", "")))
typeof(ab1[,1])
然而,这也将第一列的数据类型转换为字符型,这对进一步的数据解析产生了不利影响。是否有一种方法可以针对整个数据框并删除其中的某个字符,但保持与原始数据框相同的数据类型?我真的不想写出50列的名称并将它们转换回数值型。
编辑:以防不清楚,我希望有一个解决方案可以针对整个数据框,而不是特定列。
谢谢!
英文:
Lets simulate some data:
library(dplyr)
library(stringi)
set.seed(1)
a<-round(rnorm(20,5,5))
b<-stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab<-cbind.data.frame(a,b)
When we check for data type of the first column, it is coded as "double"
typeof(ab[,1])
I want to get rid of all commas that are potentially present in the dataframe, so I used this code:
ab1<-ab %>%
mutate_all(funs(str_replace(., ",", "")))
typeof(ab1[,1])
This however, also converts data type of column 1 into a character, which has unfortunate consequences on further data parsing. Is there a way to target a whole data frame and remove a certain character from it, but keep the same data type as in the original data frame? I really don't want to write names of 50 columns and convert them back to numeric.
Edit: Just in case it wasn't clear, I am hoping for a solution that targets the whole data frame, not specific columns.
Thanks!
答案1
得分: 1
关于仅从字符列中删除逗号,可以使用以下代码:
library(dplyr)
library(stringi)
set.seed(1)
a <- round(rnorm(20, 5, 5))
b <- stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab <- cbind.data.frame(a, b)
ab1 <- ab %>%
mutate(across(where(is.character), ~ stringr::str_remove_all(.x, ",")))
glimpse(ab1)
#> Rows: 20
#> Columns: 2
#> $ a <dbl> 2, 6, 1, 13, 7, 1, 7, 9, 8, 3, 13, 7, 2, -6, 11, 5, 5, 10, 9, 8
#> $ b <chr> "VQUNN", "ULSR", "LWKFA", "BHNQJ", "XGLHQ", "FLTBW", "IVIIL", "XWJTY…
另外,提醒一下,在最近的 tidyverse 版本中,使用 funs()
应该会生成弃用警告,而 mutate_all()
也已被 mutate(across())
取代。
英文:
What about removing commas from just character columns?
library(dplyr)
library(stringi)
set.seed(1)
a<-round(rnorm(20,5,5))
b<-stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab<-cbind.data.frame(a,b)
ab1 <- ab %>%
mutate(across(where(is.character), ~ stringr::str_remove_all(.x, ",")))
glimpse(ab1)
#> Rows: 20
#> Columns: 2
#> $ a <dbl> 2, 6, 1, 13, 7, 1, 7, 9, 8, 3, 13, 7, 2, -6, 11, 5, 5, 10, 9, 8
#> $ b <chr> "VQUNN", "ULSR", "LWKFA", "BHNQJ", "XGLHQ", "FLTBW", "IVIIL", "XWJTY…
<sup>Created on 2023-06-22 with reprex v2.0.2</sup>
By the way, using funs()
should generate a deprecated warning in recent tidyverse versions and mutate_all()
is also superseded by mutate(across())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论