需要更改我的映射以在Elasticsearch中搜索特殊字符吗?

huangapple go评论59阅读模式
英文:

Do I need to change my mapping to search with special characters in Elasticsearch?

问题

以下是您提供的内容的翻译:

我有一个正在运行的集群,并且在我的搜索查询中遇到了包含特殊字符的问题。现在,我没有设置索引的映射,映射是动态的,分析器也是标准的。我从“Analyze API”中获取了有关分析器的信息。

>  
> GET /<index>/_analyze
> {
>    "text": "some-data"
> }

给我输出了以下结果

{
  "tokens": [
    {
      "token": "some",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "data",
      "start_offset": 5,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

我的当前映射如下:

"message": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
}

由于这个输出,我得出结论,这里的分析器是标准的,特殊字符没有被索引。现在,当我尝试在我的搜索查询中包含特殊字符时,我没有得到期望的结果,当我尝试在搜索查询中使用另一个分析器时,我得到了空结果。

现在我的问题是,除了更改我的映射之外,我是否有办法在我的搜索查询中包含特殊字符。

编辑:
让我给你一个例子

假设我在Elasticsearch中索引了4个短语

data with some-data

data with some data

data

some

现在,当我发出这样的搜索查询时

GET index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "some-data"
          }
        }
      ]
    }
  }
}

我得到了所有4个句子作为结果。 <---- 这不是预期的输出

我想要的输出是

data with some-data

这是唯一应该作为命中出现的结果,但我得到了所有句子作为命中结果。

现在我还发出了term查询

GET index/_search
{
  "query": {
    "term": {
      "message.keyword": {
        "value": "some-data"
      }
    }
  }
}

这检索到了空结果

我最接近的情况是当我发出以下查询时

GET index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "some-data",
            "type": "phrase"
          }
        }
      ]
    }
  }
}

当发出这个查询时,检索到了2个结果

data with some data

data with some-data

现在是否有一种方法或查询,我可以只接收

data with some-data

作为结果?

谢谢。

英文:

I have a cluster running and I have run into a problem involving including special characters in my search query. Now I did not setup the mapping for the index the mapping is dynamic and the analyzer is also standard. The information about the analyzer I got from the "Analyze API".

&gt;  
&gt; GET /&lt;index&gt;/_analyze
&gt; {
&gt;    &quot;text&quot;: &quot;some-data&quot;
&gt; }

gave me the output

{
  &quot;tokens&quot;: [
    {
      &quot;token&quot;: &quot;some&quot;,
      &quot;start_offset&quot;: 0,
      &quot;end_offset&quot;: 4,
      &quot;type&quot;: &quot;&lt;ALPHANUM&gt;&quot;,
      &quot;position&quot;: 0
    },
    {
      &quot;token&quot;: &quot;data&quot;,
      &quot;start_offset&quot;: 5,
      &quot;end_offset&quot;: 9,
      &quot;type&quot;: &quot;&lt;ALPHANUM&gt;&quot;,
      &quot;position&quot;: 1
    }
  ]
}

my current mapping


message&quot;: {
          &quot;type&quot;: &quot;text&quot;,
          &quot;fields&quot;: {
            &quot;keyword&quot;: {
              &quot;type&quot;: &quot;keyword&quot;,
              &quot;ignore_above&quot;: 256
            }
          }

Because of this output I came to the conclusion that the analyzer here is standard and the special character is not indexed. Now when I try to include special character in my search query I am not getting the desired results and when I am trying to use another analyzer in search query I am receiving null results.

Now My question is do I have a way to include special character in my search query any other way other than changing my mapping.

EDIT:
Let me give you a example

lets suppose I index 4 phrases in elasticsearch

> data with some-data

> data with some data

> data

> some

Now when I fire a search query like this

GET index/_search
{
  &quot;query&quot;: {
    &quot;bool&quot;: {
      &quot;must&quot;: [
        {
          &quot;multi_match&quot;: {
            &quot;query&quot;: &quot;some-data&quot;
          }
        }
      ]
    }
  }
}

I get all the 4 sentences as result. <---- This is not a desired output

The output I want is

> data with some-data

This is the only result that should come as a hit but instead I get all the sentences as hits.

Now I also fired the term query

GET index/_search
{
  &quot;query&quot;: {
    &quot;term&quot;: {
      &quot;message.keyword&quot;: {
        &quot;value&quot;: &quot;some-data&quot;
      }
    }
  }
}

This retrieved null results

The closest I came was when I fired

GET index/_search
    {
      &quot;query&quot;: {
        &quot;bool&quot;: {
          &quot;must&quot;: [
            {
              &quot;multi_match&quot;: {
                &quot;query&quot;: &quot;some-data&quot;,
                &quot;type&quot;: &quot;phrase&quot;
              }
            }
          ]
        }
      }
    }

Which when fired retrieved 2 results

data with some data

data with some-data

Now is there a way or a query which I can fire so that I only receive

data with some-data

as the result

Regards

答案1

得分: 1

以下是翻译好的内容:

选项是通过术语级别查询(精确术语)来搜索,如此:message.keyword。然而,您会失去全文查询的功能。要在全文查询中使用特殊字符,例如匹配查询,您需要编辑映射(可以使用空白分析器),然后重新索引(重新索引API建议)。

更新:

如果您不能使用另一个分析器,可以在关键字字段上使用通配符,但是您必须假设性能存在一些问题。

避免以*或?开头的模式。这可能会增加查找匹配项所需的迭代次数,从而降低搜索性能。

像这样:

{
"query": {
"wildcard": {
"message.keyword": {
"value": "some-data"
}
}
}
}

英文:

The option is to search via Term Level Queries (exact term) like this: message.keyword. However, you lose the power of Full Text Queries. To use special characters in Full Text Queries, eg match query, you need to edit your mapping (you can use the whitespace analyzer) and reindex again (Reindex API suggestion).

UPDATE:

If you can't use another analyzer, you can use the wildcard on the keyword field. however you have to assume some problems with performance.

> Avoid beginning patterns with * or ?. This can increase the iterations
> needed to find matching terms and slow search performance.

Like this:

{
  &quot;query&quot;: {
    &quot;wildcard&quot;: {
      &quot;message.keyword&quot;: {
        &quot;value&quot;: &quot;*some-data*&quot;
      }
    }
  }
}

huangapple
  • 本文由 发表于 2023年4月13日 19:17:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004801.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定