从整个数据框中删除特定字符,但保持相同的数据类型?

huangapple go评论55阅读模式
英文:

Remove specific character from entire dataframe but keep same data types?

问题

让我们模拟一些数据:

library(dplyr)
library(stringi)
set.seed(1)

a <- round(rnorm(20, 5, 5))
b <- stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab <- cbind.data.frame(a, b)

当我们检查第一列的数据类型时,它被编码为 "double":

typeof(ab[,1])

我想要去掉数据框中可能存在的所有逗号,所以我使用了这段代码:

ab1 <- ab %>%
  mutate_all(funs(str_replace(., ",", "")))

typeof(ab1[,1])

然而,这也将第一列的数据类型转换为字符型,这对进一步的数据解析产生了不利影响。是否有一种方法可以针对整个数据框并删除其中的某个字符,但保持与原始数据框相同的数据类型?我真的不想写出50列的名称并将它们转换回数值型。

编辑:以防不清楚,我希望有一个解决方案可以针对整个数据框,而不是特定列。

谢谢!

英文:

Lets simulate some data:

library(dplyr)
library(stringi)
set.seed(1)

a&lt;-round(rnorm(20,5,5))
b&lt;-stri_rand_strings(20, 5, pattern = &quot;[A-Z,]&quot;)
ab&lt;-cbind.data.frame(a,b)

When we check for data type of the first column, it is coded as "double"

typeof(ab[,1])

I want to get rid of all commas that are potentially present in the dataframe, so I used this code:

ab1&lt;-ab %&gt;% 
  mutate_all(funs(str_replace(., &quot;,&quot;, &quot;&quot;)))

typeof(ab1[,1])

This however, also converts data type of column 1 into a character, which has unfortunate consequences on further data parsing. Is there a way to target a whole data frame and remove a certain character from it, but keep the same data type as in the original data frame? I really don't want to write names of 50 columns and convert them back to numeric.

Edit: Just in case it wasn't clear, I am hoping for a solution that targets the whole data frame, not specific columns.

Thanks!

答案1

得分: 1

关于仅从字符列中删除逗号,可以使用以下代码:

library(dplyr)
library(stringi)
set.seed(1)

a <- round(rnorm(20, 5, 5))
b <- stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab <- cbind.data.frame(a, b)

ab1 <- ab %>%
  mutate(across(where(is.character), ~ stringr::str_remove_all(.x, ",")))
glimpse(ab1)
#> Rows: 20
#> Columns: 2
#> $ a <dbl> 2, 6, 1, 13, 7, 1, 7, 9, 8, 3, 13, 7, 2, -6, 11, 5, 5, 10, 9, 8
#> $ b <chr> "VQUNN", "ULSR", "LWKFA", "BHNQJ", "XGLHQ", "FLTBW", "IVIIL", "XWJTY…

另外,提醒一下,在最近的 tidyverse 版本中,使用 funs() 应该会生成弃用警告,而 mutate_all() 也已被 mutate(across()) 取代。

英文:

What about removing commas from just character columns?

library(dplyr)
library(stringi)
set.seed(1)

a&lt;-round(rnorm(20,5,5))
b&lt;-stri_rand_strings(20, 5, pattern = &quot;[A-Z,]&quot;)
ab&lt;-cbind.data.frame(a,b)

ab1 &lt;- ab %&gt;%
  mutate(across(where(is.character), ~ stringr::str_remove_all(.x, &quot;,&quot;)))
glimpse(ab1)
#&gt; Rows: 20
#&gt; Columns: 2
#&gt; $ a &lt;dbl&gt; 2, 6, 1, 13, 7, 1, 7, 9, 8, 3, 13, 7, 2, -6, 11, 5, 5, 10, 9, 8
#&gt; $ b &lt;chr&gt; &quot;VQUNN&quot;, &quot;ULSR&quot;, &quot;LWKFA&quot;, &quot;BHNQJ&quot;, &quot;XGLHQ&quot;, &quot;FLTBW&quot;, &quot;IVIIL&quot;, &quot;XWJTY…

<sup>Created on 2023-06-22 with reprex v2.0.2</sup>

By the way, using funs() should generate a deprecated warning in recent tidyverse versions and mutate_all() is also superseded by mutate(across())

huangapple
  • 本文由 发表于 2023年6月22日 17:45:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76530585.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定