2022年8月24日 00:44:07go评论80阅读模式

英文:

Mongo Atlas search index for both partial matches and case-insensitive queries

问题

使用Mongo Atlas Search，我已经实现了允许使用部分匹配查询进行搜索的设置：

创建了这个索引（没有动态字段映射），名为"search_organizations_name"：

{
    "name": {
        "type": "string",
        "analyzer": "lucene.keyword",
        "searchAnalyzer": "lucene.keyword"
    }
}

并在代码中利用它（简化和匿名化）：

func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
    querySplit := strings.Split(query, " ")

    // 添加模糊匹配。
    for i := range querySplit {
        querySplit[i] = fmt.Sprintf("*%s*", querySplit[i]) 
    }

    // 定义管道阶段。
    searchStage := bson.D{
        {"$search", bson.D{
            {"index", "search_organizations_name"},
            {"wildcard", bson.D{
                {"path", "name"},
                {"query", querySplit},
            }},
        }},
    }

    // 运行管道。
    cursor, err := r.organizationsCollection().
        Aggregate(ctx, mongo.Pipeline{searchStage})
    if err != nil {// 处理错误}

    var orgs []Organization
    if err = cursor.All(ctx, &orgs); err != nil {
        return nil, errors.Wrap(err, "parsing organizations to return")
    }

    return orgs, nil
}

这个方法可以正常工作，但它是区分大小写的搜索，这并不理想。研究了这个主题后，得出了以下结果：

发现建议利用collation，但搜索索引似乎没有它，根据文档
发现建议使用lucene.standard，因为它是不区分大小写的，但它不支持部分匹配，即查询"org"不会匹配到单词"organisation"。

我希望搜索能够同时处理不区分大小写的查询和部分匹配。

我是不是在错误的方向上寻找，或者要求过多了？

英文:

Using Mongo Atlas Search I have already achieved the setup that allows for searching using partially matched queries:

Created this index (without dynamic field mapping), called "search_organizations_name":

{
   &quot;name&quot;: {
       &quot;type&quot;: &quot;string&quot;,
       &quot;analyzer&quot;: &quot;lucene.keyword&quot;,
       &quot;searchAnalyzer&quot;: &quot;lucene.keyword&quot;
   }
}

And leveraged it in code like this (simplified and anonimised):

func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
	querySplit := strings.Split(query, &quot; &quot;)

    // Adding fuzzing.
	for i := range querySplit {
		querySplit[i] = fmt.Sprintf(&quot;*%s*&quot;, querySplit[i]) 
	}

	// Define pipeline stages.
	searchStage := bson.D{
		{&quot;$search&quot;, bson.D{
			{&quot;index, &quot;search_organizations_name&quot;},
			{&quot;wildcard&quot;, bson.D{
				{&quot;path&quot;, &quot;name&quot;},
				{&quot;query&quot;, querySplit},
			}},
		}},
	}

	// Run pipeline.
	cursor, err := r.organizationsCollection().
        Aggregate(ctx, mongo.Pipeline{searchStage})
	if err != nil {// handling err}

	var orgs []Organization
	if err = cursor.All(ctx, &amp;orgs); err != nil {
		return nil, errors.Wrap(err, &quot;parsing organizations to return&quot;)
	}

	return orgs, nil
}

This works fine, but it is case sensitive search, which is not ideal. Researching the topic resulted in the following finds:

found suggestion to leverage collation, but search indices don't seem to have it as per docs
found suggestion to use lucene.standard as it's case insensitive, but it doesn't support partial matches i.e. query "org" wouldn't match to the word "organisation".

I need the search to be able to work with both case-insensitive queries and partial matches.

Am I looking in the wrong direction or asking for too much?

答案1

得分: 1

在您的用例中，可能的解决方案是使用autocomplete和nGram分词进行匹配。这将允许您进行部分匹配和不区分大小写的匹配。

相应的映射如下所示：

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "type": "string"
        },
        {
          "tokenization": "nGram",
          "type": "autocomplete"
        }
      ]
    }
  }
}

搜索查询将类似于以下内容：

{
   "$search":{
      "autocomplete":{
         "query": querySplit,
         "path":"name"
      },
      "index":"search_organizations_name"
   }
}

英文:

A possible solution in your use case could be using autocomplete with nGram tokenization. It'll allow you to do both partial as well as case-insensitive matches.

The mapping for that can be:

{
  &quot;mappings&quot;: {
    &quot;dynamic&quot;: false,
    &quot;fields&quot;: {
      &quot;name&quot;: [
        {
          &quot;type&quot;: &quot;string&quot;
        },
        {
          &quot;tokenization&quot;: &quot;nGram&quot;,
          &quot;type&quot;: &quot;autocomplete&quot;
        }
      ]
    }
  }
}

The search query would then look something like this:

{
   &quot;$search&quot;:{
      &quot;autocomplete&quot;:{
         &quot;query&quot;: querySplit,
         &quot;path&quot;:&quot;name&quot;
      },
      &quot;index&quot;:&quot;search_organizations_name&quot;
   }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Mongo Atlas搜索索引支持部分匹配和不区分大小写的查询。

问题

答案1

复制带有嵌入指针的结构体

如何在Golang中为New Relic（Golang New Relic集成）创建通用或全局上下文？

Go ReverseProxy 处理重定向错误: “http: invalid Read on closed Body”

Go：读写压缩的gob到文件中

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论