任何单个字符或数字引发的灾难性回溯错误?

huangapple go评论66阅读模式
英文:

Catastrophic backtracking error with any single character or number?

问题

首先,我知道标题不够客观,但我不明白为什么在Python的regex101网站上会出现下面的错误 "flavor"。

我尝试解释一下我的操作目的,我需要匹配在 "item" 后面的任何数字,然后匹配直到 "consumo estimado"。

正则表达式:

^item\s*(\d{0,})(.*?)consumo 

示例文本:

ITEM 1 – AGULHA DE PUNÇÃO
Agulha de punção 18 ga x 70 mm
Consumo Estimado Anual: 284
Ampla Participação

ITEM 2 - CATETER ANGIOGRÁFICO PIGTAIL
Cateter angiográfico diagnóstico pigtail 5f x 100 cm
Consumo Estimado Anual: 210
Ampla Participação

ITEM 3 – Próteses Vasculares Dracon Reta 80 Cm
PROTESES VASCULARES ANELADA - Enxerto vascular reto constituído
em politetrafluoretileno (PTFE) extrudado e expandido construído com
suporte externo anelado que aumentam a resistência mecânica.
Tamanho
aproximado 8mm (diâmetro) x 70 -80 cm (comprimento)
Consumo Estimado Anual: 34
Ampla Participação

但在输入单词 "consumo" 后面加上一个空格之后,我无法再输入其他内容,导致了 "catastrophic backtracking"。

带错误的示例正则表达式:

^item\s*(\d{0,})(.*?)consumo e

^item\s*(\d{0,})(.*?)consumo 1

解决方法是使用 .*? 来匹配 "consumo" 和 "estimado" 之间的所有内容,这样正则表达式可以正常工作。

^item\s*(\d{0,})(.*?)consumo.*?estimado

为什么会出现这个错误?我找不到任何解释。

我已经找到了问题的解决方法,但我只是想知道为什么会出现这个错误。

https://regex101.com/r/uqm7ra/1

编辑1:
如建议所示,我已经添加了带有问题的当前保存正则表达式的链接。

编辑2:
如建议所示,我在提问时也尝试遵循 "meta"。谢谢你的建议!希望问题现在更清楚了。

英文:

First of all, I know the title is not as objective as it should be, I don't get why the below error is occurring on python "flavor" in regex101 website.

Just to explain what I'm trying to do, I have to match any number after "item", followed by everything until "consumo estimado".

Regex:

^item\s*(\d{0,})(.*?)consumo 

Example text:

> ITEM 1 – AGULHA DE PUNÇÃO
Agulha de punção 18 ga x 70 mm
Consumo Estimado Anual: 284
Ampla Participação

>ITEM 2 - CATETER ANGIOGRAFICO PIGTAIL
Cateter angiográfico diagnóstico pigtail 5f x 100 cm
Consumo Estimado Anual: 210
Ampla Participação

> ITEM 3 – Próteses Vasculares Dracon Reta 80 Cm
PROTESES VASCULARES ANELADA - Enxerto vascular reto constituído
em politetrafluoretileno (PTFE) extrudado e expandido construído com
suporte externo anelado que aumentam a resistência mecânica.
Tamanho
aproximado 8mm (diâmetro) x 70 -80 cm (comprimento)
Consumo Estimado Anual: 34
Ampla Participação

But after entering the word "consumo" followed by a space, I cant put anything else, resulting in "catastrophic backtracking"

Example Regex with error:

^item\s*(\d{0,})(.*?)consumo e

^item\s*(\d{0,})(.*?)consumo 1

The solution was to use .*? to capture everything between "consumo" and "estimado", which worked properly.

^item\s*(\d{0,})(.*?)consumo.*?estimado

Why is this error occurring? I couldn't find any explanation for it.

I already have the solution for the problem, but I just wanna know why the error happened.

https://regex101.com/r/uqm7ra/1

Edit 1:
As suggested, I have added the link to the current saved regex with the problem.

Edit 2:
As suggested, I also have tried to follow the "meta" when asking for anything here in Stack Overflow. Thanks for the advice!
I hope the question is better now.

答案1

得分: 0

\d{0,}看起来有点可疑,正则引擎会尝试使用更少的数字,这可能是灾难性的。用(\D.*?)?consumo锚定它,以防止这种情况发生。

另外,如果你想要一个数字,你应该用{1,}(或者更惯用且简洁的+;同样,{0,}通常写作*)。

^item\s*(\d+)(\D.*?)?consumo
英文:

\d{0,} looks iffy, the regex engine will retry with fewer and fewer digits which can be catastrophic. Anchor it with (\D.*?)?consumo to prevent that.

Also, if you want a number, you mean {1,} (or the more idiomatic and brief +; similarly, {0,} is customarily written *).

^item\s*(\d+)(\D.*?)?consumo

huangapple
  • 本文由 发表于 2023年2月7日 03:43:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365858.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定