英文:
Mongo Atlas search index for both partial matches and case-insensitive queries
问题
使用Mongo Atlas Search,我已经实现了允许使用部分匹配查询进行搜索的设置:
创建了这个索引(没有动态字段映射),名为"search_organizations_name":
{
"name": {
"type": "string",
"analyzer": "lucene.keyword",
"searchAnalyzer": "lucene.keyword"
}
}
并在代码中利用它(简化和匿名化):
func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
querySplit := strings.Split(query, " ")
// 添加模糊匹配。
for i := range querySplit {
querySplit[i] = fmt.Sprintf("*%s*", querySplit[i])
}
// 定义管道阶段。
searchStage := bson.D{
{"$search", bson.D{
{"index", "search_organizations_name"},
{"wildcard", bson.D{
{"path", "name"},
{"query", querySplit},
}},
}},
}
// 运行管道。
cursor, err := r.organizationsCollection().
Aggregate(ctx, mongo.Pipeline{searchStage})
if err != nil {// 处理错误}
var orgs []Organization
if err = cursor.All(ctx, &orgs); err != nil {
return nil, errors.Wrap(err, "parsing organizations to return")
}
return orgs, nil
}
这个方法可以正常工作,但它是区分大小写的搜索,这并不理想。研究了这个主题后,得出了以下结果:
- 发现建议利用collation,但搜索索引似乎没有它,根据文档
- 发现建议使用lucene.standard,因为它是不区分大小写的,但它不支持部分匹配,即查询"org"不会匹配到单词"organisation"。
我希望搜索能够同时处理不区分大小写的查询和部分匹配。
我是不是在错误的方向上寻找,或者要求过多了?
英文:
Using Mongo Atlas Search I have already achieved the setup that allows for searching using partially matched queries:
Created this index (without dynamic field mapping), called "search_organizations_name":
{
"name": {
"type": "string",
"analyzer": "lucene.keyword",
"searchAnalyzer": "lucene.keyword"
}
}
And leveraged it in code like this (simplified and anonimised):
func (r *Repo) Search(ctx context.Context, query string) ([]Organization, error) {
querySplit := strings.Split(query, " ")
// Adding fuzzing.
for i := range querySplit {
querySplit[i] = fmt.Sprintf("*%s*", querySplit[i])
}
// Define pipeline stages.
searchStage := bson.D{
{"$search", bson.D{
{"index, "search_organizations_name"},
{"wildcard", bson.D{
{"path", "name"},
{"query", querySplit},
}},
}},
}
// Run pipeline.
cursor, err := r.organizationsCollection().
Aggregate(ctx, mongo.Pipeline{searchStage})
if err != nil {// handling err}
var orgs []Organization
if err = cursor.All(ctx, &orgs); err != nil {
return nil, errors.Wrap(err, "parsing organizations to return")
}
return orgs, nil
}
This works fine, but it is case sensitive search, which is not ideal. Researching the topic resulted in the following finds:
- found suggestion to leverage collation, but search indices don't seem to have it as per docs
- found suggestion to use lucene.standard as it's case insensitive, but it doesn't support partial matches i.e. query "org" wouldn't match to the word "organisation".
I need the search to be able to work with both case-insensitive queries and partial matches.
Am I looking in the wrong direction or asking for too much?
答案1
得分: 1
在您的用例中,可能的解决方案是使用autocomplete和nGram分词进行匹配。这将允许您进行部分匹配和不区分大小写的匹配。
相应的映射如下所示:
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"tokenization": "nGram",
"type": "autocomplete"
}
]
}
}
}
搜索查询将类似于以下内容:
{
"$search":{
"autocomplete":{
"query": querySplit,
"path":"name"
},
"index":"search_organizations_name"
}
}
英文:
A possible solution in your use case could be using autocomplete with nGram tokenization. It'll allow you to do both partial as well as case-insensitive matches.
The mapping for that can be:
{
"mappings": {
"dynamic": false,
"fields": {
"name": [
{
"type": "string"
},
{
"tokenization": "nGram",
"type": "autocomplete"
}
]
}
}
}
The search query would then look something like this:
{
"$search":{
"autocomplete":{
"query": querySplit,
"path":"name"
},
"index":"search_organizations_name"
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论