英文:
How to correctly set diacritic insensitive $text index for spanish lang
问题
I'm struggling to find out how to correctly set a diacritic insensitive text index for my collection of persons. It’s a normal collection without collation.
The MongoDB version is 5.0.15
I need a text index (not using mongo atlas) for the name and familyName fields. I created an index with this config:
{
"v": 2,
"key": {
"_fts": "text",
"_ftsx": 1
},
"name": "personsFullname",
"weights": {
"familyName": 1,
"name": 1
},
"default_language": "es",
"language_override": "language",
"textIndexVersion": 3
}
The problem is that even though the MongoDB manual says that from version 3 the text search is diacritic insensitive it doesn’t work that way. Or at least I'm not sure if the version refers to the prop "v": 2
, or "textIndexVersion": 3
.
Suppose I have these 3 records:
[
{
"_id": "aaaaaaa",
"name": "Roberto ",
"familyName": "Torres García "
},
{
"_id": "bbbbbbb",
"name": "Ruben A",
"familyName": "Parras García"
},
{
"_id": "ccccc",
"name": "Karla",
"familyName": "Rosas García"
}
]
If I search for García
(using diacritic for i):
db.getCollection("personsData").find({ "$text": { "$search": "García" } })
It finds the 3 records.
But if I search for Garcia
(Not using diacritic for i):
db.getCollection("personsData").find({ "$text": { "$search": "Garcia" } })
It finds no records.
What am I missing here?
Any help or hint is pretty much appreciated.
Thank you in advance.
英文:
I’m struggling to find out how to correctly set a diacritic insensitive text index for my collection of persons. It’s a normal collection without collation.
The MongoDB version is 5.0.15
I need a text index (not using mongo atlas) for the name and familyName fields. I created an index with this config:
{
"v": 2,
"key": {
"_fts": "text",
"_ftsx": 1
},
"name": "personsFullname",
"weights": {
"familyName": 1,
"name": 1
},
"default_language": "es",
"language_override": "language",
"textIndexVersion": 3
}
The problem is that even though the MongoDB manual says that from version 3 the text search is diacritic insensitive it doesn’t work that way. Or at least I'm not sure if the version refers to the prop "v": 2
, or "textIndexVersion": 3
.
Suppose I have these 3 records:
[
{
"_id": "aaaaaaa",
"name": "Roberto ",
"familyName": "Torres García "
},
{
"_id": "bbbbbbb",
"name": "Ruben A",
"familyName": "Parras García"
},
{
_id:"ccccc",
"name": "Karla",
"familyName": "Rosas García"
}
]
If I search for García
(using diacritic for i):
db.getCollection("personsData").find({ "$text": { "$search": "García" } })
It finds the 3 records.
But if I search for Garcia
(Not using diacritic for i):
db.getCollection("personsData").find({ "$text": { "$search": "Garcia" } })
It finds no records.
What am I missing here?
Any help or hint is pretty much appreciated.
Thank you in advance.
答案1
得分: 1
以下是您要翻译的内容:
在@rickhg12hs的建议下:
- 如果我在本地测试,在mongo版本6.0.5上运行,如果不设置default_language: "spanish"
db.consultas.createIndex(
{ diagnostico: "text" },
);
在在线的_mongoPlayground_上,正如@rickhg12hs指出的那样,以这种方式工作:
游乐场链接:https://mongoplayground.net/p/P6TVAR8T1oU
如果您想在本地实例中重现示例(我正在使用docker 6.0.5):
use("clinica");
db.consultas.insertMany([
{
nombre: "Juan Perez",
especialidad: "general",
diagnostico: "Dolor abdominal, Fiebre alta, tos, posible caso de COVID",
},
{
nombre: "María Pelaez",
especialidad: "general",
diagnostico: "Tensión alta, posible episodio de ataque de ansiedad",
},
{
nombre: "Javier Garcia",
especialidad: "cardiología",
diagnostico: "Arritmias, acompañado de tensión alta, enfermería",
},
{
nombre: "Manuel Gómez",
especialidad: "general",
diagnostico: "Fiebre alta, tos y mucosidades, enfermería",
},
]);
创建索引
db.consultas.createIndex(
{ diagnostico: "text" },
);
并启动查询(您可以尝试两个选项_enfermería_和_enfermeria_,您会得到结果
db.consultas.find({ $text: { $search: "enfermeria" } });
我不需要去进行详细的版本
我在其他帖子上读到要尝试
在版本6上似乎不需要这个
db.consultas.createIndex(
{ diagnostico: "text" },
{
defaultLanguage: "es",
textIndexVersion: 3,
}
);
并在查询中指示忽略变音符号:
db.consultas.find({
$text: {
$search: "enfermeria",
$diacriticSensitive: false,
},
});
英文:
Following @rickhg12hs advice:
- If I test locally, on mongo version 6.0.5 works, if I don't set default_language: "spanish"
db.consultas.createIndex(
{ diagnostico: "text" },
);
On the online mongoPlayground, as @rickhg12hs points out, works in that way:
The playground link: https://mongoplayground.net/p/P6TVAR8T1oU
And if you want to reproduce the example in a local instance (I'm using docker 6.0.5):
use("clinica");
db.consultas.insertMany([
{
nombre: "Juan Perez",
especialidad: "general",
diagnostico: "Dolor abdominal, Fiebre alta, tos, posible caso de COVID",
},
{
nombre: "María Pelaez",
especialidad: "general",
diagnostico: "Tensión alta, posible episodio de ataque de ansiedad",
},
{
nombre: "Javier Garcia",
especialidad: "cardiología",
diagnostico: "Arritmias, acompañado de tensión alta, enfermería",
},
{
nombre: "Manuel Gómez",
especialidad: "general",
diagnostico: "Fiebre alta, tos y mucosidades, enfermería",
},
]);
Creating the index
db.consultas.createIndex(
{ diagnostico: "text" },
);
And launching the query (you can try both options enfermería and enfermeria you get results
db.consultas.find({ $text: { $search: "enfermeria" } });
I didn't need to go for the ellaborated version
I read on other posts to try
THIS SEEMS NOT TO BE NEEDED ON VERSION 6
db.consultas.createIndex(
{ diagnostico: "text" },
{
defaultLanguage: "es",
textIndexVersion: 3,
}
);
And in the query indicate to ignore diacritics:
db.consultas.find({
$text: {
$search: "enfermeria",
$diacriticSensitive: false,
},
});
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论