MySQL全文搜索未返回所有匹配结果。

huangapple go评论93阅读模式
英文:

MySQL fulltext search not returning all matching results

问题

我正在使用MySQL 8,有一张名为records的表,我在上面添加了以下索引,以便对其进行文本搜索:

  1. CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);

现在,我有以下数据(以JSON形式复制):

  1. [
  2. {
  3. "barcode": "79027066021",
  4. "title": "White Blood Cells",
  5. "slug": "White-blood-cells",
  6. "description": "rev",
  7. "label": null
  8. },
  9. {
  10. "barcode": "889854036119",
  11. "title": "Nilsson Schmilsson",
  12. "slug": "nilsson-schmilsson",
  13. "description": null,
  14. "label": null
  15. },
  16. {
  17. "barcode": "C10777774644617",
  18. "title": "Abbey Road",
  19. "slug": "abbey-road",
  20. "description": null,
  21. "label": null
  22. },
  23. {
  24. "barcode": "C10777774644618",
  25. "title": "Revolver",
  26. "slug": "revolver",
  27. "description": null,
  28. "label": null
  29. },
  30. {
  31. "barcode": "C10777774644619",
  32. "title": "arevolat",
  33. "slug": "aurevolat",
  34. "description": "this is a test",
  35. "label": null
  36. }
  37. ]

所以,目前表中有5条记录。为了测试搜索功能,我运行了以下查询:

  1. SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
  2. AGAINST ('*rev*' IN BOOLEAN MODE)

我期望能得到3条记录:

  • 条形码为79027066021的记录,它应该匹配到描述description: "rev"
  • 条形码为C10777774644618的记录,它应该匹配到标题title: "Revolver"
  • 条形码为C10777774644619的记录,它应该匹配到标题title: "arevolat"

但我只收到了前两条记录:

  • 条形码为79027066021的记录,它应该匹配到描述description: "rev"
  • 条形码为C10777774644618的记录,它应该匹配到标题title: "Revolver"

这一条不存在:

  • 条形码为C10777774644619的记录,它应该匹配到标题title: "arevolat"

尽管我在搜索中使用了*

DDL(数据定义语言):

  1. CREATE TABLE `records` (
  2. -- 表定义部分已略过
  3. ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

有任何想法吗?

谢谢

英文:

I'm using MySQL 8 and I've got a table called records for which I added the following index in order to perform text search on it:

  1. CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);

Now, I've got the following data (copied it as JSON):

  1. [
  2. {
  3. "barcode": "79027066021",
  4. "title": "White Blood Cells",
  5. "slug": "White-blood-cells",
  6. "image": null,
  7. "description": "rev",
  8. "artist_id": 3,
  9. "genre_id": 1,
  10. "num_of_records": 1,
  11. "original_release_year": 2001,
  12. "original_release_date": null,
  13. "reissue_release_date": null,
  14. "label": null,
  15. "sleeve_side_color": "ff1500",
  16. "vinyl_color": "ff1500",
  17. "size": 12,
  18. "barcode_in_record": null,
  19. "catalog_number": null,
  20. "owned_copies": 1,
  21. "active": 1,
  22. "created_at": "2023-06-11 21:32:17.000",
  23. "updated_at": null,
  24. "deleted_at": null
  25. },
  26. {
  27. "barcode": "889854036119",
  28. "title": "Nilsson Schmilsson",
  29. "slug": "nilsson-schmilsson",
  30. "image": null,
  31. "description": null,
  32. "artist_id": 6,
  33. "genre_id": 1,
  34. "num_of_records": 1,
  35. "original_release_year": 1971,
  36. "original_release_date": null,
  37. "reissue_release_date": null,
  38. "label": null,
  39. "sleeve_side_color": "490d57",
  40. "vinyl_color": "000000",
  41. "size": 12,
  42. "barcode_in_record": null,
  43. "catalog_number": null,
  44. "owned_copies": 1,
  45. "active": 1,
  46. "created_at": "2023-06-11 21:32:17.000",
  47. "updated_at": null,
  48. "deleted_at": null
  49. },
  50. {
  51. "barcode": "C10777774644617",
  52. "title": "Abbey Road",
  53. "slug": "abbey-road",
  54. "image": null,
  55. "description": null,
  56. "artist_id": 2,
  57. "genre_id": 1,
  58. "num_of_records": 1,
  59. "original_release_year": 1969,
  60. "original_release_date": null,
  61. "reissue_release_date": null,
  62. "label": null,
  63. "sleeve_side_color": "46518f",
  64. "vinyl_color": "000000",
  65. "size": 12,
  66. "barcode_in_record": null,
  67. "catalog_number": null,
  68. "owned_copies": 1,
  69. "active": 1,
  70. "created_at": "2023-06-11 21:32:17.000",
  71. "updated_at": null,
  72. "deleted_at": null
  73. },
  74. {
  75. "barcode": "C10777774644618",
  76. "title": "Revolver",
  77. "slug": "revolver",
  78. "image": null,
  79. "description": null,
  80. "artist_id": 2,
  81. "genre_id": 1,
  82. "num_of_records": 1,
  83. "original_release_year": 1966,
  84. "original_release_date": null,
  85. "reissue_release_date": null,
  86. "label": null,
  87. "sleeve_side_color": "46518f",
  88. "vinyl_color": "000000",
  89. "size": 12,
  90. "barcode_in_record": null,
  91. "catalog_number": null,
  92. "owned_copies": 1,
  93. "active": 1,
  94. "created_at": "2023-06-11 21:32:17.000",
  95. "updated_at": null,
  96. "deleted_at": null
  97. },
  98. {
  99. "barcode": "C10777774644619",
  100. "title": "arevolat",
  101. "slug": "aurevolat",
  102. "image": null,
  103. "description": "this is a test",
  104. "artist_id": 2,
  105. "genre_id": 1,
  106. "num_of_records": 1,
  107. "original_release_year": 1999,
  108. "original_release_date": null,
  109. "reissue_release_date": null,
  110. "label": null,
  111. "sleeve_side_color": null,
  112. "vinyl_color": null,
  113. "size": 12,
  114. "barcode_in_record": null,
  115. "catalog_number": null,
  116. "owned_copies": 1,
  117. "active": 1,
  118. "created_at": null,
  119. "updated_at": null,
  120. "deleted_at": null
  121. }
  122. ]

So, there are currently 5 records in the table. In order to test the search functionality, I ran the following query:

  1. SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
  2. AGAINST ('*rev*' IN BOOLEAN MODE)

I was expecting to get 3 records back:

  • 1 for barcode 79027066021 where it should match to the description "description": "rev".
  • 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".
  • 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".

But I'm only getting back the first two records:

  • 1 for barcode 79027066021 where it should match to the description "description": "rev".
  • 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".

This one is not present:

  • 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".

Even though I'm using * as part of the search.

DDL:

  1. CREATE TABLE `records` (
  2. `barcode` varchar(48) NOT NULL,
  3. `title` varchar(160) NOT NULL,
  4. `slug` varchar(250) NOT NULL,
  5. `image` varchar(191) DEFAULT NULL,
  6. `artist_id` bigint unsigned DEFAULT NULL,
  7. `genre_id` bigint unsigned DEFAULT NULL,
  8. `num_of_records` bigint NOT NULL,
  9. `original_release_date` date DEFAULT NULL,
  10. `reissue_release_date` date DEFAULT NULL,
  11. `label` longtext,
  12. `sleeve_side_color` varchar(191) DEFAULT NULL,
  13. `vinyl_color` longtext,
  14. `size` bigint DEFAULT NULL,
  15. `barcode_in_record` tinyint(1) DEFAULT NULL,
  16. `catalog_number` longtext,
  17. `owned_copies` bigint DEFAULT '1',
  18. `active` tinyint(1) DEFAULT '1',
  19. `created_at` datetime(3) DEFAULT NULL,
  20. `updated_at` datetime(3) DEFAULT NULL,
  21. `deleted_at` datetime(3) DEFAULT NULL,
  22. `original_release_year` bigint NOT NULL,
  23. `description` text,
  24. PRIMARY KEY (`barcode`),
  25. UNIQUE KEY `barcode` (`barcode`),
  26. UNIQUE KEY `slug` (`slug`),
  27. KEY `fk_records_artist` (`artist_id`),
  28. KEY `fk_records_genre` (`genre_id`),
  29. FULLTEXT KEY `all_records_idx` (`title`,`label`,`description`,`catalog_number`,`barcode`),
  30. CONSTRAINT `fk_records_artist` FOREIGN KEY (`artist_id`) REFERENCES `artists` (`id`),
  31. CONSTRAINT `fk_records_genre` FOREIGN KEY (`genre_id`) REFERENCES `genres` (`id`)
  32. ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Any idea what's going on?

Thanks

答案1

得分: 2

"%","*"和其他标点符号对AGAINSTLIKERLIKE具有特定且_不同_的含义。

MATCH(title, ...) AGAINST(' rev ') -- 忽略前导标点符号
title LIKE '%rev%' -- 找到所需的3行,但仅在title中;速度较慢
title RLIKE 'rev' -- 找到所需的3行,但仅在title中;速度较慢
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- 速度更慢

在适当的情况下,FULLTEXT几乎总是比LIKERLIKE快,通常_快得多_。我建议您更改要求和/或期望。如您从上面所见,没有既简单又快速的语法。

英文:

"%", "*", and other punctuation marks have specific and different meanings for AGAINST, LIKE, and RLIKE.

  1. MATCH(title, ...) AGAINST('*rev*') -- ignores the leading punctuation mark
  2. title LIKE '%rev%' -- finds the desired 3 rows, but only in `title`; and slow
  3. title RLIKE 'rev' -- finds the desired 3 rows, but only in `title`; and slow
  4. title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- even slower

FULLTEXT, when appropriate, is almost always faster than LIKE or RLIKE; often much faster. I suggest you change the requirements and/or expectations. As you see from above, there is no syntax that is both simple and fast.

答案2

得分: 1

在MySQL中,通配符字符是%而不是*

尝试AGAINST ('%rev%' IN BOOLEAN MODE)

有时,*充当一个修饰符,用来匹配'零个或多个前面的内容实例'。

请参阅https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

然而,上述的模式匹配适用于扩展正则表达式(例如在使用RLIKE时)。

对于布尔全文搜索,*是一个截断运算符。表达式末尾的*意味着rev必须出现在一个或多个单词的开头。这解释了你的结果中为什么缺少记录。

我怀疑第一个*被忽略了。

更多信息:https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

英文:

In MySQL, the wildcard character is % rather than *.

Try AGAINST ('%rev%' IN BOOLEAN MODE)

Sometimes, * acts as a modifier to match 'zero or more instances of the thing preceding it.'

See https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

However, the above Pattern Matching applies to the extended regular expressions (such as when using RLIKE).

For a Boolean Full-text Search, the * is a truncation operator. The * at the end of your expression means that rev must be present at the start of one or more words. This explains the missing record in your results.

I suspect the first * is ignored.

More info: https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

答案3

得分: 1

> 12.9.5 全文搜索限制
>
> 百分号('%')字符不是全文搜索的支持通配符字符。

链接:https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html

> 星号(
>
> 星号用作截断(或通配符)运算符。与其他运算符不同,它附加到要影响的单词之后。**单词与在星号(
)运算符之前的单词匹配。**
> [...]
>
> 通配符单词被视为必须出现在一个或多个单词的开头的前缀。[...]
>
> 以下示例演示了一些使用布尔全文搜索运算符的搜索字符串: [...]
>
> 'apple*'
>
> 查找包含诸如“apple”、“apples”、“applesauce”或“applet”等单词的行。

链接:https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

英文:

> 12.9.5 Full-Text Restrictions
>
> The '%' character is not a supported wildcard character for full-text searches.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html

> Asterisk *
>
> The asterisk serves as the truncation (or wildcard) operator. Unlike
> the other operators, it is appended to the word to be affected. Words
> match if they begin with the word preceding the * operator.

> [...]
>
> The wildcarded word is considered as a prefix that must be present at
> the start of one or more words.[...]
>
> The following examples demonstrate some search strings that use boolean full-text operators: [...]
>
>'apple*'
>
>Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

huangapple
  • 本文由 发表于 2023年6月12日 06:22:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定