英文:
Use elasticsearch text type filed
问题
以下是翻译好的内容,只包括代码部分:
数据详情:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.7750573,
"hits": [
{
"_index": "myindex",
"_id": "1650421750907600896",
"_score": 3.7750573,
"_source": {
"areaCodeList": "350112201201,0,350112201202"
}
}
]
}
}
areaCodeList
是一个使用 ik 分词器的文本字段:
POST /myindex/_analyze
{
"field": "areaCodeList",
"text": "350112201201,0,350112201202"
}
{
"tokens": [
{
"token": "350112201201,0,350112201202",
"start_offset": 0,
"end_offset": 27,
"type": "ARABIC",
"position": 0
},
{
"token": "350112201201",
"start_offset": 0,
"end_offset": 12,
"type": "LETTER",
"position": 1
},
{
"token": "0",
"start_offset": 13,
"end_offset": 14,
"type": "LETTER",
"position": 2
},
{
"token": "350112201202",
"start_offset": 15,
"end_offset": 27,
"type": "LETTER",
"position": 3
}
]
}
最后,我使用以下查询语句,但结果为空:
GET myindex/_search
{
"query": {
"match": {
"areaCodeList": "350112201201"
}
},
"_source": ["areaCodeList"]
}
如何匹配逗号分隔的数据?
英文:
Data detail:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 3.7750573,
"hits": [
{
"_index": "myindex",
"_id": "1650421750907600896",
"_score": 3.7750573,
"_source": {
"areaCodeList": "350112201201,0,350112201202"
}
}
]
}
}
areaCodeList
is a text field that uses the ik tokenizer:
POST /myindex/_analyze
{
"field": "areaCodeList",
"text": "350112201201,0,350112201202"
}
{
"tokens": [
{
"token": "350112201201,0,350112201202",
"start_offset": 0,
"end_offset": 27,
"type": "ARABIC",
"position": 0
},
{
"token": "350112201201",
"start_offset": 0,
"end_offset": 12,
"type": "LETTER",
"position": 1
},
{
"token": "0",
"start_offset": 13,
"end_offset": 14,
"type": "LETTER",
"position": 2
},
{
"token": "350112201202",
"start_offset": 15,
"end_offset": 27,
"type": "LETTER",
"position": 3
}
]
}
Finally, i use the following query statement, but the result is empty:
GET myindex/_search
{
"query": {
"match": {
"areaCodeList": "350112201201"
}
},
"_source": ["areaCodeList"]
}
How to match comma-separated data?
答案1
得分: 0
你可以使用模式分析器。它通过所有非单词字符对文本进行标记化。
模式分析器使用正则表达式将文本分割成词项。正则表达式应该匹配标记分隔符而不是词项本身。正则表达式的默认值为\W+(或所有非单词字符)。
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html
POST _analyze
{
"tokenizer": "pattern",
"text": "350112201201,0,350112201202"
}
PUT test_code_list
{
"mappings": {
"properties": {
"areaCodeList": {
"type": "text",
"analyzer": "pattern"
}
}
}
}
PUT test_code_list/_doc/1
{
"areaCodeList": "350112201201,0,350112201202"
}
GET test_code_list/_search
{
"query": {
"match": {
"areaCodeList": "350112201201"
}
}
}
英文:
You can use the pattern analyzer. It tokenizes the text by all non-word characters.
> The pattern analyzer uses a regular expression to split the text into
> terms. The regular expression should match the token separators not
> the tokens themselves. The regular expression defaults to \W+ (or all
> non-word characters).
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html
POST _analyze
{
"tokenizer": "pattern",
"text": "350112201201,0,350112201202"
}
PUT test_code_list
{
"mappings": {
"properties": {
"areaCodeList": {
"type": "text",
"analyzer": "pattern"
}
}
}
}
PUT test_code_list/_doc/1
{
"areaCodeList": "350112201201,0,350112201202"
}
GET test_code_list/_search
{
"query": {
"match": {
"areaCodeList": "350112201201"
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论