英文:
Remove special characters in rows in a Data Frame
问题
我有一个包含名为“indicator”的数据框,其中包含混合数字和字母的内容,实际观察值之前有混合的数字和字母。在下面的DF中,混合的字母和数字是6.2-,S1.1,S3.1-和I1.1。
实际的DF:
indicator <- c("6.2- Total number customers per month (average)","S1.1 Total of unique users served per month", "S3.1- Volume of merchandise sold per month", "I1.1 Quantity of bags received per month.")
amount <- c(12, 45, 44, 67)
DF <- data.frame(indicator, amount)
如何使用stringr
或regex
删除混合的数字和字母?
英文:
I have a dataframe with a row called indicator
that have numbers mixed with letters before the actual observation (content) i.e. in the below DF, the mixed letters and numbers are
6.2-, S1.1, S3.1- & I1.1
Actual DF
indicator <- c("6.2- Total number customers per month (average)","S1.1 Total of unique users served per month", "S3.1- Volume of merchandise sold per month",
"I1.1 Quantity of bags received per month.")
amount <- c(12, 45, 44, 67)
DF <- data.frame(indicator, amount)
> DF
indicator amount
1 6.2- Total number customers per month (average) 12
2 S1.1 Total of unique users served per month 45
3 S3.1- Volume of merchandise sold per month 44
4 I1.1 Quantity of bags received per month. 67
How do I remove the mixed numbers and letters using either stringr
or regex
?
答案1
得分: 2
You could use sub()
as follows:
<!-- language: r -->
DF$indicator <- sub("^[A-Z]*\\d(?:\\.\\d+)*-?\\s+", "", DF$indicator)
DF
正则表达式的解释如下:
^
从指标的开头开始匹配[A-Z]*
匹配零个或多个首字母大写的字母\d
匹配一个数字(?:\.\d+)*
后面跟着点和数字,零次或多次-?
匹配一个可选的结尾破折号\s+
匹配一个或多个空白字符
英文:
You could use sub()
as follows:
<!-- language: r -->
DF$indicator <- sub("^[A-Z]*\\d(?:\\.\\d+)*-?\\s+", "", DF$indicator)
DF
indicator amount
1 Total number customers per month (average) 12
2 Total of unique users served per month 45
3 Volume of merchandise sold per month 44
4 Quantity of bags received per month. 67
Here is an explanation of the regex pattern being used:
^
from the start of the indicator[A-Z]*
match zero or more leading capital letters\d
match a digit(?:\.\d+)*
followed by dot and digits, zero or more times-?
match an optional ending dash\s+
match one or more whitespace characters
答案2
得分: 1
Assuming that the indicator is always at the beginning and followed by a white space, you could do this to remove everything from the beginning to the first space.
library(stringr)
library(magrittr)
DF$indicator <- DF$indicator %>% str_remove_all("^.*? ")
> DF
indicator amount
1 每月客户总数(平均) 12
2 每月服务的唯一用户总数 45
3 每月销售的商品数量 44
4 每月收到的袋子数量 67
^
锚定在字符串的开头.*
匹配一切(直到空格)?
在第一个空格处停止正则表达式匹配,而不是最后一个空格
英文:
Assuming that the indicator is always at the beginning and followed by a white space, you could do this to remove everything from the beginning to the first space.
library(stringr)
library(magrittr)
DF$indicator <- DF$indicator %>% str_remove_all("^.*? ")
> DF
indicator amount
1 Total number customers per month (average) 12
2 Total of unique users served per month 45
3 Volume of merchandise sold per month 44
4 Quantity of bags received per month. 67
^
anchors to the beginning of the string.*
matches everything (until the white space)?
stops the regex at the first white space instead of the last white space
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论