移除数据框中行的特殊字符。

huangapple go评论142阅读模式
英文:

Remove special characters in rows in a Data Frame

问题

我有一个包含名为“indicator”的数据框,其中包含混合数字和字母的内容,实际观察值之前有混合的数字和字母。在下面的DF中,混合的字母和数字是6.2-,S1.1,S3.1-和I1.1。

实际的DF:

indicator <- c("6.2- Total number customers per month (average)","S1.1 Total of unique users served per month", "S3.1- Volume of merchandise sold per month", "I1.1 Quantity of bags received per month.")
amount <- c(12, 45, 44, 67)

DF <- data.frame(indicator, amount)

如何使用stringrregex删除混合的数字和字母?

英文:

I have a dataframe with a row called indicator that have numbers mixed with letters before the actual observation (content) i.e. in the below DF, the mixed letters and numbers are

6.2-, S1.1, S3.1- &amp; I1.1

Actual DF

indicator &lt;- c(&quot;6.2- Total number customers per month (average)&quot;,&quot;S1.1 Total of unique users served per month&quot;, &quot;S3.1- Volume of merchandise sold per month&quot;, 
               &quot;I1.1 Quantity of bags received per month.&quot;)
amount &lt;- c(12, 45, 44, 67)

DF &lt;- data.frame(indicator, amount)


&gt; DF
                                        indicator amount
1 6.2- Total number customers per month (average)     12
2     S1.1 Total of unique users served per month     45
3      S3.1- Volume of merchandise sold per month     44
4       I1.1 Quantity of bags received per month.     67

How do I remove the mixed numbers and letters using either stringr or regex?

答案1

得分: 2

You could use sub() as follows:

<!-- language: r -->

DF$indicator &lt;- sub(&quot;^[A-Z]*\\d(?:\\.\\d+)*-?\\s+&quot;, &quot;&quot;, DF$indicator)
DF

正则表达式的解释如下:

  • ^ 从指标的开头开始匹配
  • [A-Z]* 匹配零个或多个首字母大写的字母
  • \d 匹配一个数字
  • (?:\.\d+)* 后面跟着点和数字,零次或多次
  • -? 匹配一个可选的结尾破折号
  • \s+ 匹配一个或多个空白字符
英文:

You could use sub() as follows:

<!-- language: r -->

DF$indicator &lt;- sub(&quot;^[A-Z]*\\d(?:\\.\\d+)*-?\\s+&quot;, &quot;&quot;, DF$indicator)
DF

                                   indicator amount
1 Total number customers per month (average)     12
2     Total of unique users served per month     45
3       Volume of merchandise sold per month     44
4       Quantity of bags received per month.     67

Here is an explanation of the regex pattern being used:

  • ^ from the start of the indicator
  • [A-Z]* match zero or more leading capital letters
  • \d match a digit
  • (?:\.\d+)* followed by dot and digits, zero or more times
  • -? match an optional ending dash
  • \s+ match one or more whitespace characters

答案2

得分: 1

Assuming that the indicator is always at the beginning and followed by a white space, you could do this to remove everything from the beginning to the first space.

library(stringr)
library(magrittr)

DF$indicator &lt;- DF$indicator  %&gt;% str_remove_all(&quot;^.*? &quot;)
&gt; DF
                                   indicator amount
1 每月客户总数(平均)     12
2 每月服务的唯一用户总数     45
3 每月销售的商品数量     44
4 每月收到的袋子数量     67
  • ^ 锚定在字符串的开头
  • .* 匹配一切(直到空格)
  • ? 在第一个空格处停止正则表达式匹配,而不是最后一个空格
英文:

Assuming that the indicator is always at the beginning and followed by a white space, you could do this to remove everything from the beginning to the first space.

library(stringr)
library(magrittr)

DF$indicator &lt;- DF$indicator  %&gt;% str_remove_all(&quot;^.*? &quot;)
&gt; DF
                                   indicator amount
1 Total number customers per month (average)     12
2     Total of unique users served per month     45
3       Volume of merchandise sold per month     44
4       Quantity of bags received per month.     67

  • ^ anchors to the beginning of the string
  • .* matches everything (until the white space)
  • ? stops the regex at the first white space instead of the last white space

huangapple
  • 本文由 发表于 2023年3月20日 23:47:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75792441.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定