Mongo Atlas搜索索引支持部分匹配和不区分大小写的查询。

huangapple go评论77阅读模式
英文:

Mongo Atlas search index for both partial matches and case-insensitive queries

问题

使用Mongo Atlas Search,我已经实现了允许使用部分匹配查询进行搜索的设置:

创建了这个索引(没有动态字段映射),名为"search_organizations_name":

{
    "name": {
        "type": "string",
        "analyzer": "lucene.keyword",
        "searchAnalyzer": "lucene.keyword"
    }
}

并在代码中利用它(简化和匿名化):

func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
    querySplit := strings.Split(query, " ")

    // 添加模糊匹配。
    for i := range querySplit {
        querySplit[i] = fmt.Sprintf("*%s*", querySplit[i]) 
    }

    // 定义管道阶段。
    searchStage := bson.D{
        {"$search", bson.D{
            {"index", "search_organizations_name"},
            {"wildcard", bson.D{
                {"path", "name"},
                {"query", querySplit},
            }},
        }},
    }

    // 运行管道。
    cursor, err := r.organizationsCollection().
        Aggregate(ctx, mongo.Pipeline{searchStage})
    if err != nil {// 处理错误}

    var orgs []Organization
    if err = cursor.All(ctx, &orgs); err != nil {
        return nil, errors.Wrap(err, "parsing organizations to return")
    }

    return orgs, nil
}

这个方法可以正常工作,但它是区分大小写的搜索,这并不理想。研究了这个主题后,得出了以下结果:

  • 发现建议利用collation,但搜索索引似乎没有它,根据文档
  • 发现建议使用lucene.standard,因为它是不区分大小写的,但它不支持部分匹配,即查询"org"不会匹配到单词"organisation"。

我希望搜索能够同时处理不区分大小写的查询和部分匹配。

我是不是在错误的方向上寻找,或者要求过多了?

英文:

Using Mongo Atlas Search I have already achieved the setup that allows for searching using partially matched queries:

Created this index (without dynamic field mapping), called "search_organizations_name":

{
   "name": {
       "type": "string",
       "analyzer": "lucene.keyword",
       "searchAnalyzer": "lucene.keyword"
   }
}

And leveraged it in code like this (simplified and anonimised):

func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
	querySplit := strings.Split(query, " ")

    // Adding fuzzing.
	for i := range querySplit {
		querySplit[i] = fmt.Sprintf("*%s*", querySplit[i]) 
	}

	// Define pipeline stages.
	searchStage := bson.D{
		{"$search", bson.D{
			{"index, "search_organizations_name"},
			{"wildcard", bson.D{
				{"path", "name"},
				{"query", querySplit},
			}},
		}},
	}

	// Run pipeline.
	cursor, err := r.organizationsCollection().
        Aggregate(ctx, mongo.Pipeline{searchStage})
	if err != nil {// handling err}

	var orgs []Organization
	if err = cursor.All(ctx, &orgs); err != nil {
		return nil, errors.Wrap(err, "parsing organizations to return")
	}

	return orgs, nil
}

This works fine, but it is case sensitive search, which is not ideal. Researching the topic resulted in the following finds:

  • found suggestion to leverage collation, but search indices don't seem to have it as per docs
  • found suggestion to use lucene.standard as it's case insensitive, but it doesn't support partial matches i.e. query "org" wouldn't match to the word "organisation".

I need the search to be able to work with both case-insensitive queries and partial matches.

Am I looking in the wrong direction or asking for too much?

答案1

得分: 1

在您的用例中,可能的解决方案是使用autocompletenGram分词进行匹配。这将允许您进行部分匹配和不区分大小写的匹配。

相应的映射如下所示:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "type": "string"
        },
        {
          "tokenization": "nGram",
          "type": "autocomplete"
        }
      ]
    }
  }
}

搜索查询将类似于以下内容:

{
   "$search":{
      "autocomplete":{
         "query": querySplit,
         "path":"name"
      },
      "index":"search_organizations_name"
   }
}
英文:

A possible solution in your use case could be using autocomplete with nGram tokenization. It'll allow you to do both partial as well as case-insensitive matches.

The mapping for that can be:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "type": "string"
        },
        {
          "tokenization": "nGram",
          "type": "autocomplete"
        }
      ]
    }
  }
}

The search query would then look something like this:

{
   "$search":{
      "autocomplete":{
         "query": querySplit,
         "path":"name"
      },
      "index":"search_organizations_name"
   }
}

huangapple
  • 本文由 发表于 2022年8月24日 00:44:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/73462381.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定