如何存储大量的正则表达式并查找匹配给定字符串的正则表达式?

huangapple go评论62阅读模式
英文:

How do I store a large number of regex and find the regex that has a match for a given string?

问题

We generally use regex to match with strings. I want to do it the other way around. I have a large number of regex. Now, given a string, I should identify which regex had a match with the string. How do I do this?

通常我们使用正则表达式与字符串匹配。我希望反过来做。我有大量的正则表达式。现在,给定一个字符串,我应该确定哪个正则表达式与字符串匹配。我该如何做?

I was considering storing all the regex in Elasticsearch and then query it using the string, but I am not able to find any documentation to see if it possible.

我曾考虑将所有的正则表达式存储在 Elasticsearch 中,然后使用字符串进行查询,但我无法找到任何文档来确定是否可行。

I could store all the regex in a DB, get the ones I want to check matches and then find matches, but is there a better way to do it?

我可以将所有的正则表达式存储在数据库中,获取我想要检查匹配的正则表达式,然后找到匹配项,但是否有更好的方法来做这件事?

英文:

We generally use regex to match with strings. I want to do it the other way around. I have a large number of regex. Now, given a string, I should identify which regex had a match with the string. How do I do this?

I was considering storing all the regex in Elasticsearch and then query it using the string, but I am not able to find any documentation to see if it possible.

I could store all the regex in a DB, get the ones I want to check matches and then find matches, but is there a better way to do it?

答案1

得分: 1

以下是翻译好的部分:

  • 使用percolator字段类型可以实现这一点。

  • 基本上,您可以索引所有您的regexp查询,然后测试哪些查询与您的文档匹配。

  • 使用percolator字段类型创建索引:

PUT regex
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword"
      },
      "query": {
        "type": "percolator"
      }
    }
  }
}
  • 索引两个正则表达式,例如:
PUT /regex/_doc/1
{
  "query": {
    "regexp": {
      "message": {
        "value": "big.*fox",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}

PUT /regex/_doc/2
{
  "query": {
    "regexp": {
      "message": {
        "value": ".*fox",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}
  • 然后测试哪个正则表达式与您的输入匹配。

  • 对于big brown fox的测试将匹配上述两个正则表达式:

POST regex/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "message": "big brown fox"
      }
    }
  }
}
  • 对于big brown bear的测试将不会匹配上述任何一个:
POST regex/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "message": "big brown bear"
      }
    }
  }
}
英文:

It's possible to do this using the percolator field type.

You can basically index all your regexp queries and then test which of the queries would match your document.

Create the index with a percolator field type:

PUT regex
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword"
      },
      "query": {
        "type": "percolator"
      }
    }
  }
}

Index two regular expressions, for instance:

PUT /regex/_doc/1
{
  "query": {
    "regexp": {
      "message": {
        "value": "big.*fox",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}

PUT /regex/_doc/2
{
  "query": {
    "regexp": {
      "message": {
        "value": ".*fox",
        "flags": "ALL",
        "case_insensitive": true
      }
    }
  }
}

Then test which regular expression would match your input.

Percolating big brown fox would match both regular expressions above:

POST regex/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "message": "big brown fox"
      }
    }
  }
}

Percolating big brown bear would match none of the above:

POST regex/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "message": "big brown bear"
      }
    }
  }
}

huangapple
  • 本文由 发表于 2023年2月24日 11:58:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552490.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定