Conditional REGEX in R to select text in front of or behind a specific character — "%" in this case

huangapple go评论102阅读模式
英文:

Conditional REGEX in R to select text in front of or behind a specific character --- "%" in this case

问题

ShippingStreet <- c("123 Main St%234 Center Street", "%555 Folsom Street",
"59 Hyde Street%")

我有一个地址的字符向量,它是通过合并两个不同向量的内容形成的。一个“%”分隔每个观察中的数据,左边(1)和右边(2)。数据看起来像这样:

我想保留“%”左边的数据,即使右边有内容,如果左边没有内容,我想保留右边的数据。

所以输出应该如下所示:

123 Main St
555 Folsom Street
59 Hyde Street

我编写了一个条件正则表达式如下,并在gsub中使用它,但它没有做我认为它应该做的事情。

  1. pattrn_pct <- "/(?(?=%)..(%.*$)|(^.*%))/gm" <<< 寻找 %,然后选择 % 后面的内容,如果 % 前面有内容,则删除,或者如果 % 前面没有内容,则选择 % 后面的内容...
  2. gsub(pattrn_pct, "", ShippingStreet, perl = TRUE) <<< 用空字符串替换选择

请注意,你提供的正则表达式语法在R中可能不适用,你可以尝试以下代码来实现你的目标:

  1. ShippingStreet <- c("123 Main St%234 Center Street", "%555 Folsom Street",
  2. "59 Hyde Street%")
  3. # 使用正则表达式替换来提取所需的部分
  4. result <- gsub(".*?%([^%]+)|%.*$", "\", ShippingStreet)
  5. # 输出结果
  6. cat(result, sep = "\n")

这将输出所期望的结果。

英文:

I have a character vector of addresses which is formed by merging contents of two different vectors. A "%" separates the data in each observation, left(1) from right(2). And the data looks like this:

  1. ShippingStreet &lt;- c(&quot;123 Main St%234 Center Street&quot;, &quot;%555 Folsom Street&quot;,
  2. &quot;59 Hyde Street%&quot;)

I want to keep the data on the left side of % even if there is something on the right, and on the right side if there is nothing on the left.

So output should look like this:

  1. 123 Main St
  2. 555 Folsom Street
  3. 59 Hyde street

I wrote a conditional regex as follows and use it in the gsub, but it is not doing what I though it should do.

  1. pattrn_pct &lt;- &quot;/(?(?=%)..(%.*$)|(^.*%))/gm&quot;` &lt;&lt;&lt; looks for % and then selects behind the % to drop if there is something in front of the %, or after the % if nothing in front ...
  2. gsub(pattrn_pct, &quot;&quot;, ShippingStreet, perl=T) &lt;&lt;&lt; replace selection with &quot;&quot;

答案1

得分: 1

我们可以在这里使用str_extract(),使用正则表达式模式[^%]+

  1. str_extract(ShippingStreet, "[^%]+")

数据:

  1. ShippingStreet <- c("123 Main St%234 Center Street", "%555 Folsom Street",
  2. "59 Hyde Street%")
英文:

We can use str_extract() here with the regex pattern [^%]+:

  1. str_extract(ShippingStreet, &quot;[^%]+&quot;)
  2. [1] &quot;123 Main St&quot; &quot;555 Folsom Street&quot; &quot;59 Hyde Street&quot;

Data:

  1. ShippingStreet &lt;- c(&quot;123 Main St%234 Center Street&quot;, &quot;%555 Folsom Street&quot;,
  2. &quot;59 Hyde Street%&quot;)

答案2

得分: 0

使用base R中的sub

  1. sub("^%?([^%]+).*", "\", ShippingStreet)
  2. [1] "123 Main St" "555 Folsom Street" "59 Hyde Street"
英文:

Using sub in base R

  1. sub(&quot;^%?([^%]+).*&quot;, &quot;\&quot;, ShippingStreet)
  2. [1] &quot;123 Main St&quot; &quot;555 Folsom Street&quot; &quot;59 Hyde Street&quot;
  3. </details>

huangapple
  • 本文由 发表于 2023年1月9日 09:52:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052551.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定