2023年7月10日 15:21:09go评论97阅读模式

英文:

Use elasticsearch text type filed

问题

以下是翻译好的内容，只包括代码部分：

数据详情：

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 3.7750573,
    "hits": [
      {
        "_index": "myindex",
        "_id": "1650421750907600896",
        "_score": 3.7750573,
        "_source": {
          "areaCodeList": "350112201201,0,350112201202"
        }
      }
    ]
  }
}

areaCodeList 是一个使用 ik 分词器的文本字段：

POST /myindex/_analyze
{
  "field": "areaCodeList",
  "text": "350112201201,0,350112201202"
}

{
  "tokens": [
    {
      "token": "350112201201,0,350112201202",
      "start_offset": 0,
      "end_offset": 27,
      "type": "ARABIC",
      "position": 0
    },
    {
      "token": "350112201201",
      "start_offset": 0,
      "end_offset": 12,
      "type": "LETTER",
      "position": 1
    },
    {
      "token": "0",
      "start_offset": 13,
      "end_offset": 14,
      "type": "LETTER",
      "position": 2
    },
    {
      "token": "350112201202",
      "start_offset": 15,
      "end_offset": 27,
      "type": "LETTER",
      "position": 3
    }
  ]
}

最后，我使用以下查询语句，但结果为空：

GET myindex/_search
{
  "query": {
    "match": {
      "areaCodeList": "350112201201"
    }
  },
  "_source": ["areaCodeList"]
}

如何匹配逗号分隔的数据？

英文:

Data detail:

{
  &quot;took&quot;: 0,
  &quot;timed_out&quot;: false,
  &quot;_shards&quot;: {
    &quot;total&quot;: 1,
    &quot;successful&quot;: 1,
    &quot;skipped&quot;: 0,
    &quot;failed&quot;: 0
  },
  &quot;hits&quot;: {
    &quot;total&quot;: {
      &quot;value&quot;: 1,
      &quot;relation&quot;: &quot;eq&quot;
    },
    &quot;max_score&quot;: 3.7750573,
    &quot;hits&quot;: [
      {
        &quot;_index&quot;: &quot;myindex&quot;,
        &quot;_id&quot;: &quot;1650421750907600896&quot;,
        &quot;_score&quot;: 3.7750573,
        &quot;_source&quot;: {
          &quot;areaCodeList&quot;: &quot;350112201201,0,350112201202&quot;
        }
      }
    ]
  }
}

areaCodeList is a text field that uses the ik tokenizer：

POST /myindex/_analyze
{
  &quot;field&quot;: &quot;areaCodeList&quot;,
  &quot;text&quot;: &quot;350112201201,0,350112201202&quot;
}

{
  &quot;tokens&quot;: [
    {
      &quot;token&quot;: &quot;350112201201,0,350112201202&quot;,
      &quot;start_offset&quot;: 0,
      &quot;end_offset&quot;: 27,
      &quot;type&quot;: &quot;ARABIC&quot;,
      &quot;position&quot;: 0
    },
    {
      &quot;token&quot;: &quot;350112201201&quot;,
      &quot;start_offset&quot;: 0,
      &quot;end_offset&quot;: 12,
      &quot;type&quot;: &quot;LETTER&quot;,
      &quot;position&quot;: 1
    },
    {
      &quot;token&quot;: &quot;0&quot;,
      &quot;start_offset&quot;: 13,
      &quot;end_offset&quot;: 14,
      &quot;type&quot;: &quot;LETTER&quot;,
      &quot;position&quot;: 2
    },
    {
      &quot;token&quot;: &quot;350112201202&quot;,
      &quot;start_offset&quot;: 15,
      &quot;end_offset&quot;: 27,
      &quot;type&quot;: &quot;LETTER&quot;,
      &quot;position&quot;: 3
    }
  ]
}

Finally, i use the following query statement, but the result is empty:

GET myindex/_search
{
  &quot;query&quot;: {
    &quot;match&quot;: {
      &quot;areaCodeList&quot;: &quot;350112201201&quot;
    }
  },
  &quot;_source&quot;: [&quot;areaCodeList&quot;]
}

How to match comma-separated data?

答案1

得分: 0

你可以使用模式分析器。它通过所有非单词字符对文本进行标记化。

模式分析器使用正则表达式将文本分割成词项。正则表达式应该匹配标记分隔符而不是词项本身。正则表达式的默认值为\W+（或所有非单词字符）。
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html

POST _analyze
{
"tokenizer": "pattern",
"text": "350112201201,0,350112201202"
}

PUT test_code_list
{
"mappings": {
"properties": {
"areaCodeList": {
"type": "text",
"analyzer": "pattern"
}
}
}
}

PUT test_code_list/_doc/1
{
"areaCodeList": "350112201201,0,350112201202"
}

GET test_code_list/_search
{
"query": {
"match": {
"areaCodeList": "350112201201"
}
}
}

英文:

You can use the pattern analyzer. It tokenizes the text by all non-word characters.

> The pattern analyzer uses a regular expression to split the text into
> terms. The regular expression should match the token separators not
> the tokens themselves. The regular expression defaults to \W+ (or all
> non-word characters).
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html

POST _analyze
{
  &quot;tokenizer&quot;: &quot;pattern&quot;,
    &quot;text&quot;: &quot;350112201201,0,350112201202&quot;
}
PUT test_code_list
{
  &quot;mappings&quot;: {
    &quot;properties&quot;: {
      &quot;areaCodeList&quot;: {
        &quot;type&quot;: &quot;text&quot;,
        &quot;analyzer&quot;: &quot;pattern&quot;
      }
    }
  }
}
PUT test_code_list/_doc/1
{
  &quot;areaCodeList&quot;: &quot;350112201201,0,350112201202&quot;
}
GET test_code_list/_search
{
  &quot;query&quot;: {
    &quot;match&quot;: {
      &quot;areaCodeList&quot;: &quot;350112201201&quot;
    }
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Elasticsearch的文本类型字段。

问题

答案1

并发文件解析和插入到Elastic Search

收集不同服务器的日志到中央服务器（Elasticsearch 和 Kibana）。

可以在使用克隆API后删除源索引吗？

将curl命令与表单文件转换为Python请求

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。