英文:
Do I need to change my mapping to search with special characters in Elasticsearch?
问题
以下是您提供的内容的翻译:
我有一个正在运行的集群,并且在我的搜索查询中遇到了包含特殊字符的问题。现在,我没有设置索引的映射,映射是动态的,分析器也是标准的。我从“Analyze API”中获取了有关分析器的信息。
>
> GET /<index>/_analyze
> {
> "text": "some-data"
> }
给我输出了以下结果
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "data",
"start_offset": 5,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
}
]
}
我的当前映射如下:
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
由于这个输出,我得出结论,这里的分析器是标准的,特殊字符没有被索引。现在,当我尝试在我的搜索查询中包含特殊字符时,我没有得到期望的结果,当我尝试在搜索查询中使用另一个分析器时,我得到了空结果。
现在我的问题是,除了更改我的映射之外,我是否有办法在我的搜索查询中包含特殊字符。
编辑:
让我给你一个例子
假设我在Elasticsearch中索引了4个短语
data with some-data
data with some data
data
some
现在,当我发出这样的搜索查询时
GET index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "some-data"
}
}
]
}
}
}
我得到了所有4个句子作为结果。 <---- 这不是预期的输出
我想要的输出是
data with some-data
这是唯一应该作为命中出现的结果,但我得到了所有句子作为命中结果。
现在我还发出了term查询
GET index/_search
{
"query": {
"term": {
"message.keyword": {
"value": "some-data"
}
}
}
}
这检索到了空结果
我最接近的情况是当我发出以下查询时
GET index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "some-data",
"type": "phrase"
}
}
]
}
}
}
当发出这个查询时,检索到了2个结果
data with some data
data with some-data
现在是否有一种方法或查询,我可以只接收
data with some-data
作为结果?
谢谢。
英文:
I have a cluster running and I have run into a problem involving including special characters in my search query. Now I did not setup the mapping for the index the mapping is dynamic and the analyzer is also standard. The information about the analyzer I got from the "Analyze API".
>
> GET /<index>/_analyze
> {
> "text": "some-data"
> }
gave me the output
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "data",
"start_offset": 5,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
}
]
}
my current mapping
message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
Because of this output I came to the conclusion that the analyzer here is standard and the special character is not indexed. Now when I try to include special character in my search query I am not getting the desired results and when I am trying to use another analyzer in search query I am receiving null results.
Now My question is do I have a way to include special character in my search query any other way other than changing my mapping.
EDIT:
Let me give you a example
lets suppose I index 4 phrases in elasticsearch
> data with some-data
> data with some data
> data
> some
Now when I fire a search query like this
GET index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "some-data"
}
}
]
}
}
}
I get all the 4 sentences as result. <---- This is not a desired output
The output I want is
> data with some-data
This is the only result that should come as a hit but instead I get all the sentences as hits.
Now I also fired the term query
GET index/_search
{
"query": {
"term": {
"message.keyword": {
"value": "some-data"
}
}
}
}
This retrieved null results
The closest I came was when I fired
GET index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "some-data",
"type": "phrase"
}
}
]
}
}
}
Which when fired retrieved 2 results
data with some data
data with some-data
Now is there a way or a query which I can fire so that I only receive
data with some-data
as the result
Regards
答案1
得分: 1
以下是翻译好的内容:
选项是通过术语级别查询(精确术语)来搜索,如此:message.keyword。然而,您会失去全文查询的功能。要在全文查询中使用特殊字符,例如匹配查询,您需要编辑映射(可以使用空白分析器),然后重新索引(重新索引API建议)。
更新:
如果您不能使用另一个分析器,可以在关键字字段上使用通配符,但是您必须假设性能存在一些问题。
避免以*或?开头的模式。这可能会增加查找匹配项所需的迭代次数,从而降低搜索性能。
像这样:
{
"query": {
"wildcard": {
"message.keyword": {
"value": "some-data"
}
}
}
}
英文:
The option is to search via Term Level Queries (exact term) like this: message.keyword. However, you lose the power of Full Text Queries. To use special characters in Full Text Queries, eg match query, you need to edit your mapping (you can use the whitespace analyzer) and reindex again (Reindex API suggestion).
UPDATE:
If you can't use another analyzer, you can use the wildcard on the keyword field. however you have to assume some problems with performance.
> Avoid beginning patterns with * or ?. This can increase the iterations
> needed to find matching terms and slow search performance.
Like this:
{
"query": {
"wildcard": {
"message.keyword": {
"value": "*some-data*"
}
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论