2023年7月18日 04:42:33go评论96阅读模式

英文:

Regex to ignore values that have a decimal point in front of value?

问题

我有一个数据集，看起来像这样：
    > dput(test)
    structure(list(Value = c("B20", "I82.B20", "B20, E88.1"), City = c("NY", "LA", "PA")), class = "data.frame", row.names = c(NA, -3L))
我想要提取那些具有相应 'value' 为 'B20' 的行，所以我有以下代码：
    B20 <- test[grep(
      "B20",
      test$Value
    ),]
然而，我 **不** 想包括那些 'B20' 后面跟着小数点的行（例如第二行（*I82.B20*））。
以下输出应该如下所示：
    > dput(B20)
    structure(list(Value = c("B20", "B20, E88.1"), City = c("NY", "PA")), row.names = c(NA, 3L), class = "data.frame")

英文:

I have a dataset that looks like this:

&gt; dput(test)
structure(list(Value = c(&quot;B20&quot;, &quot;I82.B20&quot;, &quot;B20, E88.1&quot;), City = c(&quot;NY&quot;, 
&quot;LA&quot;, &quot;PA&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

I want to extract the rows that have a corresponding value of 'B20', so I have the following code:

B20 &lt;- test[grep(
  &quot;B20&quot;,
  test$Value
),]

However, I do NOT want to include the rows where 'B20' is followed by a decimal point (such as row 2 (I82.B20)).

The following output should look like:

&gt; dput(B20)
structure(list(Value = c(&quot;B20&quot;, &quot;B20, E88.1&quot;), City = c(&quot;NY&quot;, &quot;PA&quot;)), row.names = c(NA, 3L), class = &quot;data.frame&quot;)

答案1

得分: 1

明显的解决方案是选择包含 "B20" 但排除所有包含 ".B20" 的行：

test[grepl("B20", test$Value) & !grepl("\\.B20", test$Value),]
#>        Value City
#> 1        B20   NY
#> 3 B20, E88.1   PA

尽管如果必须使用单个正则表达式，那么您可以匹配字符串的开头 ^ 或任何不是句点的字符 [^\.]，通过将这些可能性结合起来使用 (^|[^\.])。然后匹配 B20，为了安全起见，添加一个词边界 \b。请注意，我们必须转义反斜杠，因此表达式将是：

test[grep("(^|[^\\\.])B20\\b", test$Value) ,]
#>        Value City
#> 1        B20   NY
#> 3 B20, E88.1   PA

当我遇到像这样的正则表达式时，我需要花一点时间来理解它的运作方式，并考虑可能使它混淆的边缘情况，因此在实际代码中，即使第一个选项稍微不够 "聪明"，我可能更喜欢它。

英文:

The obvious solution is to select rows containing "B20" but exclude all rows including ".B20"

test[grepl(&quot;B20&quot;, test$Value) &amp; !grepl(&quot;\\.B20&quot;, test$Value),]
#&gt;        Value City
#&gt; 1        B20   NY
#&gt; 3 B20, E88.1   PA

Though if it has to be a single regex then you can match the start of the string ^ or any character that isn't a period [^\.] by combining these possibilities with (^|[^\.]). Then match B20, and for safety add a word boundary \b. Note we have to escape the backslashes, so the expression would be:

test[grep(&quot;(^|[^\\.])B20\\b&quot;, test$Value) ,]
#&gt;        Value City
#&gt; 1        B20   NY
#&gt; 3 B20, E88.1   PA

I have to spend a little bit of time when I come across a regex like this to understand what's going on and think through the possible edge cases that might confound it, so in actual code I might prefer the first option even if it is a bit less "clever".

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

正则表达式以忽略在值前面带有小数点的部分？

问题

答案1

重现包含该色彩调色板的图像（IVIS机器）。

regular expression to match exact word with boundries

如何在一行HTML中将链接变成小写？

如何在R中为每个表格列添加颜色刻度时修复“条件长度大于1”的错误？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。