如何在R中删除数据框中的空白空间

huangapple go评论72阅读模式
英文:

How to remove empty spaces in a data frame in R

问题

在使用unite函数连接变量后,有些行包含空格,我需要删除它们以便分析数据。

我尝试了使用paste函数在连接时直接移除空格,但它没有起作用。

英文:

after concatenating the variables using a unite function, there are rows that contain empty spaces which I need to delete in order to analyze the data.

如何在R中删除数据框中的空白空间

thanks!!

I tried a paste function to remove directly the empty spaces when concatenating, but it didn't work.

答案1

得分: 1

开始时请不要发布数据或代码的照片!最好执行类似 dput(head(data, 10)) 的操作。

其中一种选择可能是使用 str_replace_all(),但如果你的数据非常大,这可能会很慢。

library(dplyr)
library(stringr)

df |>
  mutate(Productos = str_replace_all(Productos, ",{2,}", ",")) |> # 移除双逗号
  mutate(Productos = str_replace_all(Productos, "^,|,$", "")) # 移除开头/结尾

尽管如此,看起来 unite(..., remove = FALSE, na.rm = TRUE) 会更好。参考示例

# 移除缺失值:
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
#> # A tibble: 4 × 3
#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 "a_b" a     b    
#> 2 "a"   a     NA   
#> 3 "b"   NA    b    
#> 4 ""    NA    NA   
英文:

Just to start off - please don't post photos of data or code! It's much more useful to do something like dput(head(data, 10)).

One option might be to use str_replace_all(), but it could be slow if your data is really big.

library(dplyr)
library(stringr)

df |>
  mutate(Productos = str_replace_all(Productos, ",{2,}", ",")) |> # remove double
  mutate(Productos = str_replace_all(Productos, "^,|,$", "")) # remove leading/trailing

That said, it looks like unite(..., remove = FALSE, na.rm = TRUE) is going to be better. From the examples:

# To remove missing values:
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
#> # A tibble: 4 × 3
#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 "a_b" a     b    
#> 2 "a"   a     NA   
#> 3 "b"   NA    b    
#> 4 ""    NA    NA   

答案2

得分: 0

If your data look like this:

df <- data.frame(Productos = c("Cervezas,Vinos,,Tequilas,Aguardientes,,,Rones,Tabaqueria,Alimentos,Bebidas",
                               "Cervezas,,Ginebras,Tequilas,Aguardientes,,,Rones,Tabaqueria,,"))

You can remove two or more commas and replace them with a single comma, then remove any leading/trailing commas in base R using gsub:

gsub("^,|,$", "", gsub(",{2,}", ",", df$Productos))

Output:

[1] "Cervezas,Vinos,Tequilas,Aguardientes,Rones,Tabaqueria,Alimentos,Bebidas"
[2] "Cervezas,Ginebras,Tequilas,Aguardientes,Rones,Tabaqueria"
英文:

If your data look like this:

df &lt;- data.frame(Productos = c(&quot;Cervezas,Vinos,,Tequilas,Aguardientes,,,Rones,Tabaqueria,Alimentos,Bebidas&quot;,
                               &quot;Cervezas,,Ginebras,Tequilas,Aguardientes,,,Rones,Tabaqueria,,&quot;))

You can remove two or more commas and replace them with a single comma, then remove any leading/trailing commas in base R using gsub:

gsub(&quot;^,|,$&quot;, &quot;&quot;, gsub(&quot;,{2,}&quot;, &quot;,&quot;,df$Productos))

Output:

[1] &quot;Cervezas,Vinos,Tequilas,Aguardientes,Rones,Tabaqueria,Alimentos,Bebidas&quot;
[2] &quot;Cervezas,Ginebras,Tequilas,Aguardientes,Rones,Tabaqueria&quot; 

huangapple
  • 本文由 发表于 2023年5月11日 08:37:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223411.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定