2023年6月22日 17:45:05go评论89阅读模式

英文:

Remove specific character from entire dataframe but keep same data types?

问题

让我们模拟一些数据：

library(dplyr)
library(stringi)
set.seed(1)
a <- round(rnorm(20, 5, 5))
b <- stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab <- cbind.data.frame(a, b)

当我们检查第一列的数据类型时，它被编码为 "double"：

typeof(ab[,1])

我想要去掉数据框中可能存在的所有逗号，所以我使用了这段代码：

ab1 <- ab %>%
  mutate_all(funs(str_replace(., ",", "")))
typeof(ab1[,1])

然而，这也将第一列的数据类型转换为字符型，这对进一步的数据解析产生了不利影响。是否有一种方法可以针对整个数据框并删除其中的某个字符，但保持与原始数据框相同的数据类型？我真的不想写出50列的名称并将它们转换回数值型。

编辑：以防不清楚，我希望有一个解决方案可以针对整个数据框，而不是特定列。

谢谢！

英文:

Lets simulate some data:

library(dplyr)
library(stringi)
set.seed(1)
a&lt;-round(rnorm(20,5,5))
b&lt;-stri_rand_strings(20, 5, pattern = &quot;[A-Z,]&quot;)
ab&lt;-cbind.data.frame(a,b)

When we check for data type of the first column, it is coded as "double"

typeof(ab[,1])

I want to get rid of all commas that are potentially present in the dataframe, so I used this code:

ab1&lt;-ab %&gt;% 
  mutate_all(funs(str_replace(., &quot;,&quot;, &quot;&quot;)))
typeof(ab1[,1])

This however, also converts data type of column 1 into a character, which has unfortunate consequences on further data parsing. Is there a way to target a whole data frame and remove a certain character from it, but keep the same data type as in the original data frame? I really don't want to write names of 50 columns and convert them back to numeric.

Edit: Just in case it wasn't clear, I am hoping for a solution that targets the whole data frame, not specific columns.

Thanks!

答案1

得分: 1

关于仅从字符列中删除逗号，可以使用以下代码：

library(dplyr)
library(stringi)
set.seed(1)
a <- round(rnorm(20, 5, 5))
b <- stri_rand_strings(20, 5, pattern = "[A-Z,]")
ab <- cbind.data.frame(a, b)
ab1 <- ab %>%
  mutate(across(where(is.character), ~ stringr::str_remove_all(.x, ",")))
glimpse(ab1)
#> Rows: 20
#> Columns: 2
#> $ a <dbl> 2, 6, 1, 13, 7, 1, 7, 9, 8, 3, 13, 7, 2, -6, 11, 5, 5, 10, 9, 8
#> $ b <chr> "VQUNN", "ULSR", "LWKFA", "BHNQJ", "XGLHQ", "FLTBW", "IVIIL", "XWJTY…

另外，提醒一下，在最近的 tidyverse 版本中，使用 funs() 应该会生成弃用警告，而 mutate_all() 也已被 mutate(across()) 取代。

英文:

What about removing commas from just character columns?

library(dplyr)
library(stringi)
set.seed(1)
a&lt;-round(rnorm(20,5,5))
b&lt;-stri_rand_strings(20, 5, pattern = &quot;[A-Z,]&quot;)
ab&lt;-cbind.data.frame(a,b)
ab1 &lt;- ab %&gt;%
  mutate(across(where(is.character), ~ stringr::str_remove_all(.x, &quot;,&quot;)))
glimpse(ab1)
#&gt; Rows: 20
#&gt; Columns: 2
#&gt; $ a &lt;dbl&gt; 2, 6, 1, 13, 7, 1, 7, 9, 8, 3, 13, 7, 2, -6, 11, 5, 5, 10, 9, 8
#&gt; $ b &lt;chr&gt; &quot;VQUNN&quot;, &quot;ULSR&quot;, &quot;LWKFA&quot;, &quot;BHNQJ&quot;, &quot;XGLHQ&quot;, &quot;FLTBW&quot;, &quot;IVIIL&quot;, &quot;XWJTY…

<sup>Created on 2023-06-22 with reprex v2.0.2</sup>

By the way, using funs() should generate a deprecated warning in recent tidyverse versions and mutate_all() is also superseded by mutate(across())

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从整个数据框中删除特定字符，但保持相同的数据类型？

问题

答案1

Simplify R scripts to load multiple SpatVectors, SpatRaster and then mask the spatial rasters by using SpatVectors

在R中基于特定参数合并多个观测数据。

how to add a box containing text (mean=….,sigma=…) in the left side and a horizontally laied histogramm in the right side of a plot in r?

Reframing output of confidence intervals to combine mean, upper and lower values into one cell

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。