2023年3月20日 23:47:26go评论170阅读模式

英文:

Remove special characters in rows in a Data Frame

问题

我有一个包含名为“indicator”的数据框，其中包含混合数字和字母的内容，实际观察值之前有混合的数字和字母。在下面的DF中，混合的字母和数字是6.2-，S1.1，S3.1-和I1.1。

实际的DF：

indicator <- c("6.2- Total number customers per month (average)","S1.1 Total of unique users served per month", "S3.1- Volume of merchandise sold per month", "I1.1 Quantity of bags received per month.")
amount <- c(12, 45, 44, 67)
DF <- data.frame(indicator, amount)

如何使用stringr或regex删除混合的数字和字母？

英文:

I have a dataframe with a row called indicator that have numbers mixed with letters before the actual observation (content) i.e. in the below DF, the mixed letters and numbers are

6.2-, S1.1, S3.1- &amp; I1.1

Actual DF

indicator &lt;- c(&quot;6.2- Total number customers per month (average)&quot;,&quot;S1.1 Total of unique users served per month&quot;, &quot;S3.1- Volume of merchandise sold per month&quot;, 
               &quot;I1.1 Quantity of bags received per month.&quot;)
amount &lt;- c(12, 45, 44, 67)
DF &lt;- data.frame(indicator, amount)
&gt; DF
                                        indicator amount
1 6.2- Total number customers per month (average)     12
2     S1.1 Total of unique users served per month     45
3      S3.1- Volume of merchandise sold per month     44
4       I1.1 Quantity of bags received per month.     67

How do I remove the mixed numbers and letters using either stringr or regex?

答案1

得分: 2

You could use sub() as follows:

DF$indicator &lt;- sub(&quot;^[A-Z]*\\d(?:\\.\\d+)*-?\\s+&quot;, &quot;&quot;, DF$indicator)
DF

正则表达式的解释如下：

^ 从指标的开头开始匹配
[A-Z]* 匹配零个或多个首字母大写的字母
\d 匹配一个数字
(?:\.\d+)* 后面跟着点和数字，零次或多次
-? 匹配一个可选的结尾破折号
\s+ 匹配一个或多个空白字符

英文:

You could use sub() as follows:

DF$indicator &lt;- sub(&quot;^[A-Z]*\\d(?:\\.\\d+)*-?\\s+&quot;, &quot;&quot;, DF$indicator)
DF
                                   indicator amount
1 Total number customers per month (average)     12
2     Total of unique users served per month     45
3       Volume of merchandise sold per month     44
4       Quantity of bags received per month.     67

Here is an explanation of the regex pattern being used:

^ from the start of the indicator
[A-Z]* match zero or more leading capital letters
\d match a digit
(?:\.\d+)* followed by dot and digits, zero or more times
-? match an optional ending dash
\s+ match one or more whitespace characters

答案2

得分: 1

Assuming that the indicator is always at the beginning and followed by a white space, you could do this to remove everything from the beginning to the first space.

library(stringr)
library(magrittr)
DF$indicator &lt;- DF$indicator  %&gt;% str_remove_all(&quot;^.*? &quot;)

&gt; DF
                                   indicator amount
1 每月客户总数（平均）     12
2 每月服务的唯一用户总数     45
3 每月销售的商品数量     44
4 每月收到的袋子数量     67

^ 锚定在字符串的开头
.* 匹配一切（直到空格）
? 在第一个空格处停止正则表达式匹配，而不是最后一个空格

英文:

Assuming that the indicator is always at the beginning and followed by a white space, you could do this to remove everything from the beginning to the first space.

library(stringr)
library(magrittr)
DF$indicator &lt;- DF$indicator  %&gt;% str_remove_all(&quot;^.*? &quot;)

&gt; DF
                                   indicator amount
1 Total number customers per month (average)     12
2     Total of unique users served per month     45
3       Volume of merchandise sold per month     44
4       Quantity of bags received per month.     67

^ anchors to the beginning of the string
.* matches everything (until the white space)
? stops the regex at the first white space instead of the last white space

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

移除数据框中行的特殊字符。

问题

答案1

答案2

dplyr解决方案以精确和部分字符串连接方式

正则表达式：多次保留模式的一部分

将大列表的地址拆分并分批输入地理编码器。

R ggplot2美学：颜色、点形状和点的填充/未填充基于3个单独的变量。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。