英文:
MySQL fulltext search not returning all matching results
问题
我正在使用MySQL 8,有一张名为records
的表,我在上面添加了以下索引,以便对其进行文本搜索:
CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);
现在,我有以下数据(以JSON形式复制):
[
{
"barcode": "79027066021",
"title": "White Blood Cells",
"slug": "White-blood-cells",
"description": "rev",
"label": null
},
{
"barcode": "889854036119",
"title": "Nilsson Schmilsson",
"slug": "nilsson-schmilsson",
"description": null,
"label": null
},
{
"barcode": "C10777774644617",
"title": "Abbey Road",
"slug": "abbey-road",
"description": null,
"label": null
},
{
"barcode": "C10777774644618",
"title": "Revolver",
"slug": "revolver",
"description": null,
"label": null
},
{
"barcode": "C10777774644619",
"title": "arevolat",
"slug": "aurevolat",
"description": "this is a test",
"label": null
}
]
所以,目前表中有5条记录。为了测试搜索功能,我运行了以下查询:
SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
AGAINST ('*rev*' IN BOOLEAN MODE)
我期望能得到3条记录:
- 条形码为79027066021的记录,它应该匹配到描述
description
:"rev"
。 - 条形码为C10777774644618的记录,它应该匹配到标题
title
:"Revolver"
。 - 条形码为C10777774644619的记录,它应该匹配到标题
title
:"arevolat"
。
但我只收到了前两条记录:
- 条形码为79027066021的记录,它应该匹配到描述
description
:"rev"
。 - 条形码为C10777774644618的记录,它应该匹配到标题
title
:"Revolver"
。
这一条不存在:
- 条形码为C10777774644619的记录,它应该匹配到标题
title
:"arevolat"
。
尽管我在搜索中使用了*
。
DDL(数据定义语言):
CREATE TABLE `records` (
-- 表定义部分已略过
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
有任何想法吗?
谢谢
英文:
I'm using MySQL 8 and I've got a table called records
for which I added the following index in order to perform text search on it:
CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);
Now, I've got the following data (copied it as JSON):
[
{
"barcode": "79027066021",
"title": "White Blood Cells",
"slug": "White-blood-cells",
"image": null,
"description": "rev",
"artist_id": 3,
"genre_id": 1,
"num_of_records": 1,
"original_release_year": 2001,
"original_release_date": null,
"reissue_release_date": null,
"label": null,
"sleeve_side_color": "ff1500",
"vinyl_color": "ff1500",
"size": 12,
"barcode_in_record": null,
"catalog_number": null,
"owned_copies": 1,
"active": 1,
"created_at": "2023-06-11 21:32:17.000",
"updated_at": null,
"deleted_at": null
},
{
"barcode": "889854036119",
"title": "Nilsson Schmilsson",
"slug": "nilsson-schmilsson",
"image": null,
"description": null,
"artist_id": 6,
"genre_id": 1,
"num_of_records": 1,
"original_release_year": 1971,
"original_release_date": null,
"reissue_release_date": null,
"label": null,
"sleeve_side_color": "490d57",
"vinyl_color": "000000",
"size": 12,
"barcode_in_record": null,
"catalog_number": null,
"owned_copies": 1,
"active": 1,
"created_at": "2023-06-11 21:32:17.000",
"updated_at": null,
"deleted_at": null
},
{
"barcode": "C10777774644617",
"title": "Abbey Road",
"slug": "abbey-road",
"image": null,
"description": null,
"artist_id": 2,
"genre_id": 1,
"num_of_records": 1,
"original_release_year": 1969,
"original_release_date": null,
"reissue_release_date": null,
"label": null,
"sleeve_side_color": "46518f",
"vinyl_color": "000000",
"size": 12,
"barcode_in_record": null,
"catalog_number": null,
"owned_copies": 1,
"active": 1,
"created_at": "2023-06-11 21:32:17.000",
"updated_at": null,
"deleted_at": null
},
{
"barcode": "C10777774644618",
"title": "Revolver",
"slug": "revolver",
"image": null,
"description": null,
"artist_id": 2,
"genre_id": 1,
"num_of_records": 1,
"original_release_year": 1966,
"original_release_date": null,
"reissue_release_date": null,
"label": null,
"sleeve_side_color": "46518f",
"vinyl_color": "000000",
"size": 12,
"barcode_in_record": null,
"catalog_number": null,
"owned_copies": 1,
"active": 1,
"created_at": "2023-06-11 21:32:17.000",
"updated_at": null,
"deleted_at": null
},
{
"barcode": "C10777774644619",
"title": "arevolat",
"slug": "aurevolat",
"image": null,
"description": "this is a test",
"artist_id": 2,
"genre_id": 1,
"num_of_records": 1,
"original_release_year": 1999,
"original_release_date": null,
"reissue_release_date": null,
"label": null,
"sleeve_side_color": null,
"vinyl_color": null,
"size": 12,
"barcode_in_record": null,
"catalog_number": null,
"owned_copies": 1,
"active": 1,
"created_at": null,
"updated_at": null,
"deleted_at": null
}
]
So, there are currently 5 records in the table. In order to test the search functionality, I ran the following query:
SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
AGAINST ('*rev*' IN BOOLEAN MODE)
I was expecting to get 3 records back:
- 1 for barcode 79027066021 where it should match to the description "description": "rev".
- 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".
- 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".
But I'm only getting back the first two records:
- 1 for barcode 79027066021 where it should match to the description "description": "rev".
- 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".
This one is not present:
- 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".
Even though I'm using *
as part of the search.
DDL:
CREATE TABLE `records` (
`barcode` varchar(48) NOT NULL,
`title` varchar(160) NOT NULL,
`slug` varchar(250) NOT NULL,
`image` varchar(191) DEFAULT NULL,
`artist_id` bigint unsigned DEFAULT NULL,
`genre_id` bigint unsigned DEFAULT NULL,
`num_of_records` bigint NOT NULL,
`original_release_date` date DEFAULT NULL,
`reissue_release_date` date DEFAULT NULL,
`label` longtext,
`sleeve_side_color` varchar(191) DEFAULT NULL,
`vinyl_color` longtext,
`size` bigint DEFAULT NULL,
`barcode_in_record` tinyint(1) DEFAULT NULL,
`catalog_number` longtext,
`owned_copies` bigint DEFAULT '1',
`active` tinyint(1) DEFAULT '1',
`created_at` datetime(3) DEFAULT NULL,
`updated_at` datetime(3) DEFAULT NULL,
`deleted_at` datetime(3) DEFAULT NULL,
`original_release_year` bigint NOT NULL,
`description` text,
PRIMARY KEY (`barcode`),
UNIQUE KEY `barcode` (`barcode`),
UNIQUE KEY `slug` (`slug`),
KEY `fk_records_artist` (`artist_id`),
KEY `fk_records_genre` (`genre_id`),
FULLTEXT KEY `all_records_idx` (`title`,`label`,`description`,`catalog_number`,`barcode`),
CONSTRAINT `fk_records_artist` FOREIGN KEY (`artist_id`) REFERENCES `artists` (`id`),
CONSTRAINT `fk_records_genre` FOREIGN KEY (`genre_id`) REFERENCES `genres` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Any idea what's going on?
Thanks
答案1
得分: 2
"%","*"和其他标点符号对AGAINST
,LIKE
和RLIKE
具有特定且_不同_的含义。
MATCH(title, ...) AGAINST(' rev ') -- 忽略前导标点符号
title LIKE '%rev%' -- 找到所需的3行,但仅在title
中;速度较慢
title RLIKE 'rev' -- 找到所需的3行,但仅在title
中;速度较慢
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- 速度更慢
在适当的情况下,FULLTEXT几乎总是比LIKE
或RLIKE
快,通常_快得多_。我建议您更改要求和/或期望。如您从上面所见,没有既简单又快速的语法。
英文:
"%", "*", and other punctuation marks have specific and different meanings for AGAINST
, LIKE
, and RLIKE
.
MATCH(title, ...) AGAINST('*rev*') -- ignores the leading punctuation mark
title LIKE '%rev%' -- finds the desired 3 rows, but only in `title`; and slow
title RLIKE 'rev' -- finds the desired 3 rows, but only in `title`; and slow
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- even slower
FULLTEXT, when appropriate, is almost always faster than LIKE
or RLIKE
; often much faster. I suggest you change the requirements and/or expectations. As you see from above, there is no syntax that is both simple and fast.
答案2
得分: 1
在MySQL中,通配符字符是%
而不是*
。
尝试AGAINST ('%rev%' IN BOOLEAN MODE)
有时,*
充当一个修饰符,用来匹配'零个或多个前面的内容实例'。
请参阅https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
然而,上述的模式匹配适用于扩展正则表达式(例如在使用RLIKE
时)。
对于布尔全文搜索,*
是一个截断运算符。表达式末尾的*
意味着rev
必须出现在一个或多个单词的开头。这解释了你的结果中为什么缺少记录。
我怀疑第一个*
被忽略了。
更多信息:https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html
英文:
In MySQL, the wildcard character is %
rather than *
.
Try AGAINST ('%rev%' IN BOOLEAN MODE)
Sometimes, *
acts as a modifier to match 'zero or more instances of the thing preceding it.'
See https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
However, the above Pattern Matching applies to the extended regular expressions (such as when using RLIKE
).
For a Boolean Full-text Search, the *
is a truncation operator. The * at the end of your expression means that rev
must be present at the start of one or more words. This explains the missing record in your results.
I suspect the first *
is ignored.
More info: https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html
答案3
得分: 1
> 12.9.5 全文搜索限制
>
> 百分号('%')字符不是全文搜索的支持通配符字符。
链接:https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html
> 星号()
>
> 星号用作截断(或通配符)运算符。与其他运算符不同,它附加到要影响的单词之后。**单词与在星号()运算符之前的单词匹配。**
> [...]
>
> 通配符单词被视为必须出现在一个或多个单词的开头的前缀。[...]
>
> 以下示例演示了一些使用布尔全文搜索运算符的搜索字符串: [...]
>
> 'apple*'
>
> 查找包含诸如“apple”、“apples”、“applesauce”或“applet”等单词的行。
链接:https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
英文:
> 12.9.5 Full-Text Restrictions
>
> The '%' character is not a supported wildcard character for full-text searches.
https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html
> Asterisk *
>
> The asterisk serves as the truncation (or wildcard) operator. Unlike
> the other operators, it is appended to the word to be affected. Words
> match if they begin with the word preceding the * operator.
> [...]
>
> The wildcarded word is considered as a prefix that must be present at
> the start of one or more words.[...]
>
> The following examples demonstrate some search strings that use boolean full-text operators: [...]
>
>'apple*'
>
>Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.
https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论