如何正确设置西班牙语文本的音标不敏感索引。

huangapple go评论69阅读模式
英文:

How to correctly set diacritic insensitive $text index for spanish lang

问题

I'm struggling to find out how to correctly set a diacritic insensitive text index for my collection of persons. It’s a normal collection without collation.

The MongoDB version is 5.0.15

I need a text index (not using mongo atlas) for the name and familyName fields. I created an index with this config:

{
  "v": 2,
  "key": {
    "_fts": "text",
    "_ftsx": 1
  },
  "name": "personsFullname",
  "weights": {
    "familyName": 1,
    "name": 1
  },
  "default_language": "es",
  "language_override": "language",
  "textIndexVersion": 3
}

The problem is that even though the MongoDB manual says that from version 3 the text search is diacritic insensitive it doesn’t work that way. Or at least I'm not sure if the version refers to the prop "v": 2, or "textIndexVersion": 3.

Suppose I have these 3 records:

[
  {
    "_id": "aaaaaaa",
    "name": "Roberto ",
    "familyName": "Torres García "
  },
  {
    "_id": "bbbbbbb",
    "name": "Ruben A",
    "familyName": "Parras García"
  },
  {
    "_id": "ccccc",
    "name": "Karla",
    "familyName": "Rosas García"
  }
]

If I search for García (using diacritic for i):

db.getCollection("personsData").find({ "$text": { "$search": "García" } })

It finds the 3 records.

But if I search for Garcia (Not using diacritic for i):

db.getCollection("personsData").find({ "$text": { "$search": "Garcia" } })

It finds no records.

What am I missing here?

Any help or hint is pretty much appreciated.

Thank you in advance.

英文:

I’m struggling to find out how to correctly set a diacritic insensitive text index for my collection of persons. It’s a normal collection without collation.

The MongoDB version is 5.0.15

I need a text index (not using mongo atlas) for the name and familyName fields. I created an index with this config:

{
  "v": 2,
  "key": {
    "_fts": "text",
    "_ftsx": 1
  },
  "name": "personsFullname",
  "weights": {
    "familyName": 1,
    "name": 1
  },
  "default_language": "es",
  "language_override": "language",
  "textIndexVersion": 3
}

The problem is that even though the MongoDB manual says that from version 3 the text search is diacritic insensitive it doesn’t work that way. Or at least I'm not sure if the version refers to the prop "v": 2, or "textIndexVersion": 3.

Suppose I have these 3 records:

[
  {
    "_id": "aaaaaaa",
    "name": "Roberto ",
    "familyName": "Torres García "
  },
  {
    "_id": "bbbbbbb",
    "name": "Ruben A",
    "familyName": "Parras García"
  },
  {
    _id:"ccccc",
    "name": "Karla",
    "familyName": "Rosas García"
  }
]

If I search for García (using diacritic for i):

db.getCollection("personsData").find({ "$text": { "$search": "García" } })

It finds the 3 records.

But if I search for Garcia (Not using diacritic for i):


db.getCollection("personsData").find({ "$text": { "$search": "Garcia" } })

It finds no records.

What am I missing here?

Any help or hint is pretty much appreciated.

Thank you in advance.

答案1

得分: 1

以下是您要翻译的内容:

在@rickhg12hs的建议下:

  • 如果我在本地测试,在mongo版本6.0.5上运行,如果不设置default_language: "spanish"
db.consultas.createIndex(
  { diagnostico: "text" },
);

在在线的_mongoPlayground_上,正如@rickhg12hs指出的那样,以这种方式工作:

游乐场链接:https://mongoplayground.net/p/P6TVAR8T1oU

如果您想在本地实例中重现示例(我正在使用docker 6.0.5):

use("clinica");

db.consultas.insertMany([
  {
    nombre: "Juan Perez",
    especialidad: "general",
    diagnostico: "Dolor abdominal, Fiebre alta, tos, posible caso de COVID",
  },
  {
    nombre: "María Pelaez",
    especialidad: "general",
    diagnostico: "Tensión alta, posible episodio de ataque de ansiedad",
  },
  {
    nombre: "Javier Garcia",
    especialidad: "cardiología",
    diagnostico: "Arritmias, acompañado de tensión alta, enfermería",
  },
  {
    nombre: "Manuel Gómez",
    especialidad: "general",
    diagnostico: "Fiebre alta, tos y mucosidades, enfermería",
  },
]);

创建索引

db.consultas.createIndex(
  { diagnostico: "text" },
);

并启动查询(您可以尝试两个选项_enfermería_和_enfermeria_,您会得到结果

db.consultas.find({ $text: { $search: "enfermeria" } });

我不需要去进行详细的版本

我在其他帖子上读到要尝试

在版本6上似乎不需要这个

db.consultas.createIndex(
  { diagnostico: "text" },
  {
    defaultLanguage: "es",
    textIndexVersion: 3,
  }
);

并在查询中指示忽略变音符号:

db.consultas.find({
  $text: {
    $search: "enfermeria",
    $diacriticSensitive: false,
  },
});
英文:

Following @rickhg12hs advice:

  • If I test locally, on mongo version 6.0.5 works, if I don't set default_language: "spanish"
db.consultas.createIndex(
  { diagnostico: "text" },
);

On the online mongoPlayground, as @rickhg12hs points out, works in that way:

The playground link: https://mongoplayground.net/p/P6TVAR8T1oU

And if you want to reproduce the example in a local instance (I'm using docker 6.0.5):

use("clinica");

db.consultas.insertMany([
  {
    nombre: "Juan Perez",
    especialidad: "general",
    diagnostico: "Dolor abdominal, Fiebre alta, tos, posible caso de COVID",
  },
  {
    nombre: "María Pelaez",
    especialidad: "general",
    diagnostico: "Tensión alta, posible episodio de ataque de ansiedad",
  },
  {
    nombre: "Javier Garcia",
    especialidad: "cardiología",
    diagnostico: "Arritmias, acompañado de tensión alta, enfermería",
  },
  {
    nombre: "Manuel Gómez",
    especialidad: "general",
    diagnostico: "Fiebre alta, tos y mucosidades, enfermería",
  },
]);

Creating the index

db.consultas.createIndex(
  { diagnostico: "text" },
);

And launching the query (you can try both options enfermería and enfermeria you get results

db.consultas.find({ $text: { $search: "enfermeria" } });

I didn't need to go for the ellaborated version

I read on other posts to try

THIS SEEMS NOT TO BE NEEDED ON VERSION 6

db.consultas.createIndex(
  { diagnostico: "text" },
  {
    defaultLanguage: "es",
    textIndexVersion: 3,
  }
);

And in the query indicate to ignore diacritics:

db.consultas.find({
  $text: {
    $search: "enfermeria",
    $diacriticSensitive: false,
  },
});

huangapple
  • 本文由 发表于 2023年4月13日 21:30:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006045.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定