英文:
How to remove empty spaces in a data frame in R
问题
在使用unite函数连接变量后,有些行包含空格,我需要删除它们以便分析数据。
我尝试了使用paste函数在连接时直接移除空格,但它没有起作用。
英文:
after concatenating the variables using a unite function, there are rows that contain empty spaces which I need to delete in order to analyze the data.
thanks!!
I tried a paste function to remove directly the empty spaces when concatenating, but it didn't work.
答案1
得分: 1
开始时请不要发布数据或代码的照片!最好执行类似 dput(head(data, 10))
的操作。
其中一种选择可能是使用 str_replace_all()
,但如果你的数据非常大,这可能会很慢。
library(dplyr)
library(stringr)
df |>
mutate(Productos = str_replace_all(Productos, ",{2,}", ",")) |> # 移除双逗号
mutate(Productos = str_replace_all(Productos, "^,|,$", "")) # 移除开头/结尾
尽管如此,看起来 unite(..., remove = FALSE, na.rm = TRUE)
会更好。参考示例:
# 移除缺失值:
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
#> # A tibble: 4 × 3
#> z x y
#> <chr> <chr> <chr>
#> 1 "a_b" a b
#> 2 "a" a NA
#> 3 "b" NA b
#> 4 "" NA NA
英文:
Just to start off - please don't post photos of data or code! It's much more useful to do something like dput(head(data, 10))
.
One option might be to use str_replace_all()
, but it could be slow if your data is really big.
library(dplyr)
library(stringr)
df |>
mutate(Productos = str_replace_all(Productos, ",{2,}", ",")) |> # remove double
mutate(Productos = str_replace_all(Productos, "^,|,$", "")) # remove leading/trailing
That said, it looks like unite(..., remove = FALSE, na.rm = TRUE)
is going to be better. From the examples:
# To remove missing values:
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
#> # A tibble: 4 × 3
#> z x y
#> <chr> <chr> <chr>
#> 1 "a_b" a b
#> 2 "a" a NA
#> 3 "b" NA b
#> 4 "" NA NA
答案2
得分: 0
If your data look like this:
df <- data.frame(Productos = c("Cervezas,Vinos,,Tequilas,Aguardientes,,,Rones,Tabaqueria,Alimentos,Bebidas",
"Cervezas,,Ginebras,Tequilas,Aguardientes,,,Rones,Tabaqueria,,"))
You can remove two or more commas and replace them with a single comma, then remove any leading/trailing commas in base R using gsub
:
gsub("^,|,$", "", gsub(",{2,}", ",", df$Productos))
Output:
[1] "Cervezas,Vinos,Tequilas,Aguardientes,Rones,Tabaqueria,Alimentos,Bebidas"
[2] "Cervezas,Ginebras,Tequilas,Aguardientes,Rones,Tabaqueria"
英文:
If your data look like this:
df <- data.frame(Productos = c("Cervezas,Vinos,,Tequilas,Aguardientes,,,Rones,Tabaqueria,Alimentos,Bebidas",
"Cervezas,,Ginebras,Tequilas,Aguardientes,,,Rones,Tabaqueria,,"))
You can remove two or more commas and replace them with a single comma, then remove any leading/trailing commas in base R using gsub
:
gsub("^,|,$", "", gsub(",{2,}", ",",df$Productos))
Output:
[1] "Cervezas,Vinos,Tequilas,Aguardientes,Rones,Tabaqueria,Alimentos,Bebidas"
[2] "Cervezas,Ginebras,Tequilas,Aguardientes,Rones,Tabaqueria"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论