英文:
Regular Expression to extract string after seeing "number + one letter + [comma or whitespace]" in Bigquery
问题
Abbey Grove
Abbey Grove
Abbey Road View
Abbey Road
Abbey Terrace
Abbey Wood Road
Abbey Grove
英文:
I am trying to extract:
Abbey Grove<br>
Abbey Grove<br>
Abbey Road View<br>
Abbey Road<br>
Abbey Terrace<br>
Abbey Wood Road<br>
Abbey Grove<br>
from
23a, Abbey Grove
43a Abbey Grove
Block 509a Abbey Road View
511 Abbey Road
Flat 8a, Abbey Terrace
14 Abbey Wood Road
100 Abbey Grove
in Google Bigquery. The issue is that:
regexp_replace(text, '[^a-zA-Z]', '')
gives me "aabbeywood" with two a's. Essentially I just want to keep all the text after a "numeric" or "numeric plus one letter" string.
答案1
得分: 1
这是已翻译的内容:
"这并不容易,因为我不知道您所有的约束条件(例如,街道名称是否可以包含数字?在数字之前是否可以有除了“Block”以外的其他词?)。以下是适用于给定示例的正则表达式(您可以从group3
中获取街道名称):
^(Block ){0,1}([0-9]+[A-Z|a-z]{0,1}[,]{0,1} )([a-z|A-Z| ]+)
请查看此链接以获取示例。"
英文:
It's not easy because I don't know all your constraints (e.g. Can street names contain numbers? Can there be other words than "Block" before the number?). Here is a regex that works for the given examples (you can get street names from group3
):
^(Block ){0,1}([0-9]+[A-Z|a-z]{0,1}[,]{0,1} )([a-z|A-Z| ]+)
See this link for an example.
答案2
得分: 1
请检查这个,也许会对你有所帮助。
> 1) 创建一个函数
CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
RETURNS VARCHAR(1000)
BEGIN
DECLARE @pos INT
SET @Pos = PATINDEX('%[^a-z A-Z]%',@Input)
WHILE @Pos > 0
BEGIN
SET @Input = STUFF(@Input,@pos,1,'')
SET @Pos = PATINDEX('%[^a-z A-Z]%',@Input)
END
RETURN @Input
END
GO
创建函数后,运行以下查询
DECLARE @Tabel TABLE(
Text VARCHAR(250)
)
INSERT INTO @Tabel
VALUES('23a, Abbey Grove'),
('43a Abbey Grove'),
('Block 509a Abbey Road View'),
('511 Abbey Road'),
('Block 8a, Abbey Terrace'),
('14 Abbey Wood Road'),
('100 Abbey Grove')
SELECT dbo.RemoveChars(Text) AS Text FROM @Tabel
英文:
Please check this maybe it helps you.
> 1) Create one function
CREATE FUNCTION dbo.RemoveChars(@Input varchar(1000))
RETURNS VARCHAR(1000)
BEGIN
DECLARE @pos INT
SET @Pos = PATINDEX('%[^a-z A-Z]%',@Input)
WHILE @Pos > 0
BEGIN
SET @Input = STUFF(@Input,@pos,1,'')
SET @Pos = PATINDEX('%[^a-z A-Z]%',@Input)
END
RETURN @Input
END
GO
After creating function run below query
DECLARE @Tabel TABLE(
Text VARCHAR(250)
)
INSERT INTO @Tabel
VALUES('23a, Abbey Grove'),
('43a Abbey Grove'),
('Block 509a Abbey Road View'),
('511 Abbey Road'),
('Block 8a, Abbey Terrace'),
('14 Abbey Wood Road'),
('100 Abbey Grove')
SELECT dbo.RemoveChars(Text) AS Text FROM @Tabel
答案3
得分: 1
SELECT 函数中的正则表达式可以翻译为:
- 搜索任意字符零次或多次
- 搜索数字一次或多次
- 搜索小写或大写字母 a 到 z 之间的零个或一个
- 搜索任意非字母字符零次或多次
英文:
SELECT regexp_replace(t, '.*[0-9]+[a-zA-Z]?[^a-zA-Z]*', '') FROM UNNEST(['23a, Abbey Grove','43a Abbey Grove','Block 509a Abbey Road View','511 Abbey Road','Flat 8a, Abbey Terrace','14 Abbey Wood Road','100 Abbey Grove']) t
I tried to reproduce the problem with your data. For this specific data it worked in BigQuery
.
This regex can be translated as:
- Search for any characters zero or more times
- Search for numbers one or more times
- Search for zero or one letters between a and z (lower or upper)
- Search for any character that is not a letter zero or more times
You you have some different cases where this regex doesnt apply, please let me know.
I hope it helps
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论