Conditional REGEX in R to select text in front of or behind a specific character — "%" in this case

huangapple go评论74阅读模式
英文:

Conditional REGEX in R to select text in front of or behind a specific character --- "%" in this case

问题

ShippingStreet <- c("123 Main St%234 Center Street", "%555 Folsom Street",
"59 Hyde Street%")

我有一个地址的字符向量,它是通过合并两个不同向量的内容形成的。一个“%”分隔每个观察中的数据,左边(1)和右边(2)。数据看起来像这样:

我想保留“%”左边的数据,即使右边有内容,如果左边没有内容,我想保留右边的数据。

所以输出应该如下所示:

123 Main St
555 Folsom Street
59 Hyde Street

我编写了一个条件正则表达式如下,并在gsub中使用它,但它没有做我认为它应该做的事情。

pattrn_pct <- "/(?(?=%)..(%.*$)|(^.*%))/gm" <<< 寻找 %,然后选择 % 后面的内容,如果 % 前面有内容,则删除,或者如果 % 前面没有内容,则选择 % 后面的内容...

gsub(pattrn_pct, "", ShippingStreet, perl = TRUE) <<< 用空字符串替换选择

请注意,你提供的正则表达式语法在R中可能不适用,你可以尝试以下代码来实现你的目标:

ShippingStreet <- c("123 Main St%234 Center Street", "%555 Folsom Street",
                    "59 Hyde Street%")

# 使用正则表达式替换来提取所需的部分
result <- gsub(".*?%([^%]+)|%.*$", "\", ShippingStreet)

# 输出结果
cat(result, sep = "\n")

这将输出所期望的结果。

英文:

I have a character vector of addresses which is formed by merging contents of two different vectors. A "%" separates the data in each observation, left(1) from right(2). And the data looks like this:

ShippingStreet &lt;- c(&quot;123 Main St%234 Center Street&quot;, &quot;%555 Folsom Street&quot;,
                    &quot;59 Hyde Street%&quot;) 

I want to keep the data on the left side of % even if there is something on the right, and on the right side if there is nothing on the left.

So output should look like this:

123 Main St
555 Folsom Street
59 Hyde street

I wrote a conditional regex as follows and use it in the gsub, but it is not doing what I though it should do.

pattrn_pct &lt;- &quot;/(?(?=%)..(%.*$)|(^.*%))/gm&quot;`   &lt;&lt;&lt; looks for % and then selects behind the % to drop if there is something in front of the %, or after the % if nothing in front ...

gsub(pattrn_pct, &quot;&quot;, ShippingStreet, perl=T)  &lt;&lt;&lt; replace selection with &quot;&quot;

答案1

得分: 1

我们可以在这里使用str_extract(),使用正则表达式模式[^%]+

str_extract(ShippingStreet, "[^%]+")

数据:

ShippingStreet <- c("123 Main St%234 Center Street", "%555 Folsom Street",
                    "59 Hyde Street%")
英文:

We can use str_extract() here with the regex pattern [^%]+:

str_extract(ShippingStreet, &quot;[^%]+&quot;)

[1] &quot;123 Main St&quot;       &quot;555 Folsom Street&quot; &quot;59 Hyde Street&quot;

Data:

ShippingStreet &lt;- c(&quot;123 Main St%234 Center Street&quot;, &quot;%555 Folsom Street&quot;,
                    &quot;59 Hyde Street%&quot;)

答案2

得分: 0

使用base R中的sub

sub("^%?([^%]+).*", "\", ShippingStreet)
[1] "123 Main St"       "555 Folsom Street" "59 Hyde Street"
英文:

Using sub in base R

sub(&quot;^%?([^%]+).*&quot;, &quot;\&quot;, ShippingStreet)
[1] &quot;123 Main St&quot;       &quot;555 Folsom Street&quot; &quot;59 Hyde Street&quot;   

</details>



huangapple
  • 本文由 发表于 2023年1月9日 09:52:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052551.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定