英文:
Partial text-formatted date match with Elasticsearch regexp
问题
I formatted a date field dob
as text dob.strftime("%m/%d/%Y")
and stored these dates on Elasticsearch 8.7.1 ("lucene_version": "9.5.0"
) so I could utilize regexp
to do partial date matching.
Suppose this date is stored on Elasticsearch: 06/01/2023
, I noticed that I was only able to get this result back when using these regexp queries:
06.*
.*01.*
.*2023
However, using /
or \/
or \\/
in the query couldn't get back any result. Double checked on Elasticsearch's doc, /
is NOT a reserved character.
I have a few questions, help would be much appreciated:
- Why is
/
not working as a part of theregexp
query? - How to correctly formulate the
regexp
query? I wish I could find matches after typing a term that's matching any of the following format:
- M/d
- M/d/YY
- M/d/YYYY
- M/dd
- M/dd/YY
- M/dd/YYYY
- M/YY
- M/YYYY
- MM/d
- MM/d/YY
- MM/d/YYYY
- MM/dd
- MM/dd/YY
- MM/dd/YYYY
- MM/YY
- MM/YYYY
- Does any other types of search work better than
regexp
?
英文:
I formatted a date field dob
as text dob.strftime("%m/%d/%Y")
and stored these dates on Elasticsearch 8.7.1 ("lucene_version": "9.5.0"
) so I could utilize regexp
to do partial date matching.
Suppose this date is stored on Elasticsearch: 06/01/2023
, I noticed that I was only able to get this result back when using these regexp queries:
06.*
.*01.*
.*2023
However, using /
or \/
or \\/
in the query couldn't get back any result. Double checked on Elasticsearch's doc, /
is NOT a reserved character.
I have a few questions, help would be much appreciated:
- Why is
/
not working as a part of theregexp
query? - How to correctly formulate the
regexp
query? I wish I could find matches after typing a term that's matching any of the following format:
- M/d
- M/d/YY
- M/d/YYYY
- M/dd
- M/dd/YY
- M/dd/YYYY
- M/YY
- M/YYYY
- MM/d
- MM/d/YY
- MM/d/YYYY
- MM/dd
- MM/dd/YY
- MM/dd/YYYY
- MM/YY
- MM/YYYY
- Does any other types of search work better than
regexp
?
答案1
得分: 0
根据这份官方文档,/
确实是一个保留字符。当在请求体中使用JSON时,需要两个前置反斜杠(\),因为反斜杠在JSON字符串中是一个保留的转义字符。
regexp
查询在文本字段和关键字字段上的工作方式不同。Elasticsearch在应用正则表达式之前会分析字段。文本字段会被分词为单独的单词,因此使用/
无法找到任何匹配项。
相反,整个关键字字段字符串被视为单个且非分析的字符串(参见关键字分析器)。在我改用关键字字段后,可以使用/
和正则表达式进行搜索:
dob = fields.TextField(fields={"raw": fields.KeywordField()})
"mappings": {
"properties": {
"dob": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
{
"query": {
"regexp": {
"dob.raw": ".*6\\/.*2023.*"
}
}
}
英文:
According to this official doc, /
is indeed a reserved character. When using JSON for the request body, two preceding backslashes (\) are required since the backslash is a reserved escaping character in JSON strings.
regexp
query works differently for text and keyword fields. Elasticsearch analyzes fields before applying regex. Text fields are tokenized into individual words so using /
couldn't find any match.
Instead, the entire keyword field string is treated as a single and non-analyzed string (see Keyword analyzer). Searching with /
and regexp worked after I used a keyword field instead:
dob = fields.TextField(fields={"raw": fields.KeywordField()})
"mappings": {
"properties": {
"dob": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
{
"query": {
"regexp": {
"dob.raw": ".*6\\/.*2023.*"
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论