Regular Expression to extract string after seeing "number + one letter + [comma or whitespace]" in Bigquery

huangapple go评论74阅读模式
英文:

Regular Expression to extract string after seeing "number + one letter + [comma or whitespace]" in Bigquery

问题

Abbey Grove
Abbey Grove
Abbey Road View
Abbey Road
Abbey Terrace
Abbey Wood Road
Abbey Grove

英文:

I am trying to extract:

Abbey Grove<br>
Abbey Grove<br>
Abbey Road View<br>
Abbey Road<br>
Abbey Terrace<br>
Abbey Wood Road<br>
Abbey Grove<br>

from

23a, Abbey Grove
43a Abbey Grove
Block 509a Abbey Road View
511 Abbey Road
Flat 8a, Abbey Terrace
14 Abbey Wood Road
100 Abbey Grove

in Google Bigquery. The issue is that:

regexp_replace(text, &#39;[^a-zA-Z]&#39;, &#39;&#39;)

gives me "aabbeywood" with two a's. Essentially I just want to keep all the text after a "numeric" or "numeric plus one letter" string.

答案1

得分: 1

这是已翻译的内容:

"这并不容易,因为我不知道您所有的约束条件(例如,街道名称是否可以包含数字?在数字之前是否可以有除了“Block”以外的其他词?)。以下是适用于给定示例的正则表达式(您可以从group3中获取街道名称):

^(Block ){0,1}([0-9]+[A-Z|a-z]{0,1}[,]{0,1} )([a-z|A-Z| ]+)

请查看此链接以获取示例。"

英文:

It's not easy because I don't know all your constraints (e.g. Can street names contain numbers? Can there be other words than "Block" before the number?). Here is a regex that works for the given examples (you can get street names from group3):

^(Block ){0,1}([0-9]+[A-Z|a-z]{0,1}[,]{0,1} )([a-z|A-Z| ]+)

See this link for an example.

答案2

得分: 1

请检查这个,也许会对你有所帮助。

> 1) 创建一个函数

CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
RETURNS VARCHAR(1000)
BEGIN
  DECLARE @pos INT
  SET @Pos = PATINDEX('%[^a-z A-Z]%',@Input)
  WHILE @Pos > 0
   BEGIN
    SET @Input = STUFF(@Input,@pos,1,'')
    SET @Pos = PATINDEX('%[^a-z A-Z]%',@Input)
   END
  RETURN @Input
END
GO

创建函数后,运行以下查询

DECLARE @Tabel TABLE(
	Text VARCHAR(250)
)

INSERT INTO @Tabel 
VALUES('23a, Abbey Grove'),
('43a Abbey Grove'),
('Block 509a Abbey Road View'),
('511 Abbey Road'),
('Block 8a, Abbey Terrace'),
('14 Abbey Wood Road'),
('100 Abbey Grove')

SELECT dbo.RemoveChars(Text) AS Text FROM @Tabel
英文:

Please check this maybe it helps you.

> 1) Create one function

CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
RETURNS VARCHAR(1000)
BEGIN
  DECLARE @pos INT
  SET @Pos = PATINDEX(&#39;%[^a-z A-Z]%&#39;,@Input)
  WHILE @Pos &gt; 0
   BEGIN
    SET @Input = STUFF(@Input,@pos,1,&#39;&#39;)
    SET @Pos = PATINDEX(&#39;%[^a-z A-Z]%&#39;,@Input)
   END
  RETURN @Input
END
GO

After creating function run below query

DECLARE @Tabel TABLE(
	Text VARCHAR(250)
)

INSERT INTO @Tabel 
VALUES(&#39;23a, Abbey Grove&#39;),
(&#39;43a Abbey Grove&#39;),
(&#39;Block 509a Abbey Road View&#39;),
(&#39;511 Abbey Road&#39;),
(&#39;Block 8a, Abbey Terrace&#39;),
(&#39;14 Abbey Wood Road&#39;),
(&#39;100 Abbey Grove&#39;)

SELECT dbo.RemoveChars(Text) AS Text FROM @Tabel

答案3

得分: 1

SELECT 函数中的正则表达式可以翻译为:

  1. 搜索任意字符零次或多次
  2. 搜索数字一次或多次
  3. 搜索小写或大写字母 a 到 z 之间的零个或一个
  4. 搜索任意非字母字符零次或多次
英文:
SELECT regexp_replace(t, &#39;.*[0-9]+[a-zA-Z]?[^a-zA-Z]*&#39;, &#39;&#39;) FROM UNNEST([&#39;23a, Abbey Grove&#39;,&#39;43a Abbey Grove&#39;,&#39;Block 509a Abbey Road View&#39;,&#39;511 Abbey Road&#39;,&#39;Flat 8a, Abbey Terrace&#39;,&#39;14 Abbey Wood Road&#39;,&#39;100 Abbey Grove&#39;]) t

I tried to reproduce the problem with your data. For this specific data it worked in BigQuery.

This regex can be translated as:

  1. Search for any characters zero or more times
  2. Search for numbers one or more times
  3. Search for zero or one letters between a and z (lower or upper)
  4. Search for any character that is not a letter zero or more times

You you have some different cases where this regex doesnt apply, please let me know.
I hope it helps

huangapple
  • 本文由 发表于 2020年1月6日 18:16:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/59610248.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定