2023年2月24日 11:58:39go评论80阅读模式

英文:

How do I store a large number of regex and find the regex that has a match for a given string?

问题

We generally use regex to match with strings. I want to do it the other way around. I have a large number of regex. Now, given a string, I should identify which regex had a match with the string. How do I do this?

通常我们使用正则表达式与字符串匹配。我希望反过来做。我有大量的正则表达式。现在，给定一个字符串，我应该确定哪个正则表达式与字符串匹配。我该如何做？

I was considering storing all the regex in Elasticsearch and then query it using the string, but I am not able to find any documentation to see if it possible.

我曾考虑将所有的正则表达式存储在 Elasticsearch 中，然后使用字符串进行查询，但我无法找到任何文档来确定是否可行。

I could store all the regex in a DB, get the ones I want to check matches and then find matches, but is there a better way to do it?

我可以将所有的正则表达式存储在数据库中，获取我想要检查匹配的正则表达式，然后找到匹配项，但是否有更好的方法来做这件事？

英文:

I was considering storing all the regex in Elasticsearch and then query it using the string, but I am not able to find any documentation to see if it possible.

I could store all the regex in a DB, get the ones I want to check matches and then find matches, but is there a better way to do it?

答案1

得分: 1

以下是翻译好的部分：

使用percolator字段类型可以实现这一点。
基本上，您可以索引所有您的regexp查询，然后测试哪些查询与您的文档匹配。
使用percolator字段类型创建索引：

PUT regex
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword"
      },
      "query": {
        "type": "percolator"
      }
    }
  }
}

索引两个正则表达式，例如：

PUT /regex/_doc/1
{
  "query": {
    "regexp": {
      "message": {
        "value": "big.*fox",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}

PUT /regex/_doc/2
{
  "query": {
    "regexp": {
      "message": {
        "value": ".*fox",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}

然后测试哪个正则表达式与您的输入匹配。
对于big brown fox的测试将匹配上述两个正则表达式：

POST regex/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "message": "big brown fox"
      }
    }
  }
}

对于big brown bear的测试将不会匹配上述任何一个：

POST regex/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "message": "big brown bear"
      }
    }
  }
}

英文:

It's possible to do this using the percolator field type.

You can basically index all your regexp queries and then test which of the queries would match your document.

Create the index with a percolator field type:

PUT regex
{
  &quot;mappings&quot;: {
    &quot;properties&quot;: {
      &quot;message&quot;: {
        &quot;type&quot;: &quot;keyword&quot;
      },
      &quot;query&quot;: {
        &quot;type&quot;: &quot;percolator&quot;
      }
    }
  }
}

Index two regular expressions, for instance:

PUT /regex/_doc/1
{
  &quot;query&quot;: {
    &quot;regexp&quot;: {
      &quot;message&quot;: {
        &quot;value&quot;: &quot;big.*fox&quot;,
        &quot;flags&quot;: &quot;ALL&quot;,
        &quot;case_insensitive&quot;: true
      }
    }
  }
}

PUT /regex/_doc/2
{
  &quot;query&quot;: {
    &quot;regexp&quot;: {
      &quot;message&quot;: {
        &quot;value&quot;: &quot;.*fox&quot;,
        &quot;flags&quot;: &quot;ALL&quot;,
        &quot;case_insensitive&quot;: true
      }
    }
  }
}

Then test which regular expression would match your input.

Percolating big brown fox would match both regular expressions above:

POST regex/_search
{
  &quot;query&quot;: {
    &quot;percolate&quot;: {
      &quot;field&quot;: &quot;query&quot;,
      &quot;document&quot;: {
        &quot;message&quot;: &quot;big brown fox&quot;
      }
    }
  }
}

Percolating big brown bear would match none of the above:

POST regex/_search
{
  &quot;query&quot;: {
    &quot;percolate&quot;: {
      &quot;field&quot;: &quot;query&quot;,
      &quot;document&quot;: {
        &quot;message&quot;: &quot;big brown bear&quot;
      }
    }
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何存储大量的正则表达式并查找匹配给定字符串的正则表达式？

问题

答案1

如何使用pandas显示系列的所有元素

driver = webdriver.Chrome() :: 使用Selenium方法时的问题 – 如何解决

从不同的数据框中获取聚合结果并根据条件将其添加到当前数据框中。

从字典键创建具有不同名称的多个文本文件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论