英文:
How do I store a large number of regex and find the regex that has a match for a given string?
问题
We generally use regex to match with strings. I want to do it the other way around. I have a large number of regex. Now, given a string, I should identify which regex had a match with the string. How do I do this?
通常我们使用正则表达式与字符串匹配。我希望反过来做。我有大量的正则表达式。现在,给定一个字符串,我应该确定哪个正则表达式与字符串匹配。我该如何做?
I was considering storing all the regex in Elasticsearch and then query it using the string, but I am not able to find any documentation to see if it possible.
我曾考虑将所有的正则表达式存储在 Elasticsearch 中,然后使用字符串进行查询,但我无法找到任何文档来确定是否可行。
I could store all the regex in a DB, get the ones I want to check matches and then find matches, but is there a better way to do it?
我可以将所有的正则表达式存储在数据库中,获取我想要检查匹配的正则表达式,然后找到匹配项,但是否有更好的方法来做这件事?
英文:
We generally use regex to match with strings. I want to do it the other way around. I have a large number of regex. Now, given a string, I should identify which regex had a match with the string. How do I do this?
I was considering storing all the regex in Elasticsearch and then query it using the string, but I am not able to find any documentation to see if it possible.
I could store all the regex in a DB, get the ones I want to check matches and then find matches, but is there a better way to do it?
答案1
得分: 1
以下是翻译好的部分:
-
使用
percolator
字段类型可以实现这一点。 -
基本上,您可以索引所有您的
regexp
查询,然后测试哪些查询与您的文档匹配。 -
使用
percolator
字段类型创建索引:
PUT regex
{
"mappings": {
"properties": {
"message": {
"type": "keyword"
},
"query": {
"type": "percolator"
}
}
}
}
- 索引两个正则表达式,例如:
PUT /regex/_doc/1
{
"query": {
"regexp": {
"message": {
"value": "big.*fox",
"flags": "ALL",
"case_insensitive": true
}
}
}
}
PUT /regex/_doc/2
{
"query": {
"regexp": {
"message": {
"value": ".*fox",
"flags": "ALL",
"case_insensitive": true
}
}
}
}
-
然后测试哪个正则表达式与您的输入匹配。
-
对于
big brown fox
的测试将匹配上述两个正则表达式:
POST regex/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"message": "big brown fox"
}
}
}
}
- 对于
big brown bear
的测试将不会匹配上述任何一个:
POST regex/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"message": "big brown bear"
}
}
}
}
英文:
It's possible to do this using the percolator
field type.
You can basically index all your regexp
queries and then test which of the queries would match your document.
Create the index with a percolator field type:
PUT regex
{
"mappings": {
"properties": {
"message": {
"type": "keyword"
},
"query": {
"type": "percolator"
}
}
}
}
Index two regular expressions, for instance:
PUT /regex/_doc/1
{
"query": {
"regexp": {
"message": {
"value": "big.*fox",
"flags": "ALL",
"case_insensitive": true
}
}
}
}
PUT /regex/_doc/2
{
"query": {
"regexp": {
"message": {
"value": ".*fox",
"flags": "ALL",
"case_insensitive": true
}
}
}
}
Then test which regular expression would match your input.
Percolating big brown fox
would match both regular expressions above:
POST regex/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"message": "big brown fox"
}
}
}
}
Percolating big brown bear
would match none of the above:
POST regex/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"message": "big brown bear"
}
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论