MySQL全文搜索未返回所有匹配结果。

huangapple go评论54阅读模式
英文:

MySQL fulltext search not returning all matching results

问题

我正在使用MySQL 8,有一张名为records的表,我在上面添加了以下索引,以便对其进行文本搜索:

CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);

现在,我有以下数据(以JSON形式复制):

[
  {
    "barcode": "79027066021",
    "title": "White Blood Cells",
    "slug": "White-blood-cells",
    "description": "rev",
    "label": null
  },
  {
    "barcode": "889854036119",
    "title": "Nilsson Schmilsson",
    "slug": "nilsson-schmilsson",
    "description": null,
    "label": null
  },
  {
    "barcode": "C10777774644617",
    "title": "Abbey Road",
    "slug": "abbey-road",
    "description": null,
    "label": null
  },
  {
    "barcode": "C10777774644618",
    "title": "Revolver",
    "slug": "revolver",
    "description": null,
    "label": null
  },
  {
    "barcode": "C10777774644619",
    "title": "arevolat",
    "slug": "aurevolat",
    "description": "this is a test",
    "label": null
  }
]

所以,目前表中有5条记录。为了测试搜索功能,我运行了以下查询:

SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
AGAINST ('*rev*' IN BOOLEAN MODE)

我期望能得到3条记录:

  • 条形码为79027066021的记录,它应该匹配到描述description: "rev"
  • 条形码为C10777774644618的记录,它应该匹配到标题title: "Revolver"
  • 条形码为C10777774644619的记录,它应该匹配到标题title: "arevolat"

但我只收到了前两条记录:

  • 条形码为79027066021的记录,它应该匹配到描述description: "rev"
  • 条形码为C10777774644618的记录,它应该匹配到标题title: "Revolver"

这一条不存在:

  • 条形码为C10777774644619的记录,它应该匹配到标题title: "arevolat"

尽管我在搜索中使用了*

DDL(数据定义语言):

CREATE TABLE `records` (
  -- 表定义部分已略过
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

有任何想法吗?

谢谢

英文:

I'm using MySQL 8 and I've got a table called records for which I added the following index in order to perform text search on it:

CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);

Now, I've got the following data (copied it as JSON):

[
  {
    "barcode": "79027066021",
    "title": "White Blood Cells",
    "slug": "White-blood-cells",
    "image": null,
    "description": "rev",
    "artist_id": 3,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 2001,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "ff1500",
    "vinyl_color": "ff1500",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "889854036119",
    "title": "Nilsson Schmilsson",
    "slug": "nilsson-schmilsson",
    "image": null,
    "description": null,
    "artist_id": 6,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1971,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "490d57",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644617",
    "title": "Abbey Road",
    "slug": "abbey-road",
    "image": null,
    "description": null,
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1969,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "46518f",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644618",
    "title": "Revolver",
    "slug": "revolver",
    "image": null,
    "description": null,
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1966,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "46518f",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644619",
    "title": "arevolat",
    "slug": "aurevolat",
    "image": null,
    "description": "this is a test",
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1999,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": null,
    "vinyl_color": null,
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": null,
    "updated_at": null,
    "deleted_at": null
  }
]

So, there are currently 5 records in the table. In order to test the search functionality, I ran the following query:

SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
AGAINST ('*rev*' IN BOOLEAN MODE)

I was expecting to get 3 records back:

  • 1 for barcode 79027066021 where it should match to the description "description": "rev".
  • 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".
  • 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".

But I'm only getting back the first two records:

  • 1 for barcode 79027066021 where it should match to the description "description": "rev".
  • 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".

This one is not present:

  • 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".

Even though I'm using * as part of the search.

DDL:

CREATE TABLE `records` (
  `barcode` varchar(48) NOT NULL,
  `title` varchar(160) NOT NULL,
  `slug` varchar(250) NOT NULL,
  `image` varchar(191) DEFAULT NULL,
  `artist_id` bigint unsigned DEFAULT NULL,
  `genre_id` bigint unsigned DEFAULT NULL,
  `num_of_records` bigint NOT NULL,
  `original_release_date` date DEFAULT NULL,
  `reissue_release_date` date DEFAULT NULL,
  `label` longtext,
  `sleeve_side_color` varchar(191) DEFAULT NULL,
  `vinyl_color` longtext,
  `size` bigint DEFAULT NULL,
  `barcode_in_record` tinyint(1) DEFAULT NULL,
  `catalog_number` longtext,
  `owned_copies` bigint DEFAULT '1',
  `active` tinyint(1) DEFAULT '1',
  `created_at` datetime(3) DEFAULT NULL,
  `updated_at` datetime(3) DEFAULT NULL,
  `deleted_at` datetime(3) DEFAULT NULL,
  `original_release_year` bigint NOT NULL,
  `description` text,
  PRIMARY KEY (`barcode`),
  UNIQUE KEY `barcode` (`barcode`),
  UNIQUE KEY `slug` (`slug`),
  KEY `fk_records_artist` (`artist_id`),
  KEY `fk_records_genre` (`genre_id`),
  FULLTEXT KEY `all_records_idx` (`title`,`label`,`description`,`catalog_number`,`barcode`),
  CONSTRAINT `fk_records_artist` FOREIGN KEY (`artist_id`) REFERENCES `artists` (`id`),
  CONSTRAINT `fk_records_genre` FOREIGN KEY (`genre_id`) REFERENCES `genres` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Any idea what's going on?

Thanks

答案1

得分: 2

"%","*"和其他标点符号对AGAINSTLIKERLIKE具有特定且_不同_的含义。

MATCH(title, ...) AGAINST(' rev ') -- 忽略前导标点符号
title LIKE '%rev%' -- 找到所需的3行,但仅在title中;速度较慢
title RLIKE 'rev' -- 找到所需的3行,但仅在title中;速度较慢
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- 速度更慢

在适当的情况下,FULLTEXT几乎总是比LIKERLIKE快,通常_快得多_。我建议您更改要求和/或期望。如您从上面所见,没有既简单又快速的语法。

英文:

"%", "*", and other punctuation marks have specific and different meanings for AGAINST, LIKE, and RLIKE.

MATCH(title, ...) AGAINST('*rev*')  -- ignores the leading punctuation mark
title LIKE '%rev%'  -- finds the desired 3 rows, but only in `title`; and slow
title RLIKE 'rev'  -- finds the desired 3 rows, but only in `title`; and slow
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- even slower

FULLTEXT, when appropriate, is almost always faster than LIKE or RLIKE; often much faster. I suggest you change the requirements and/or expectations. As you see from above, there is no syntax that is both simple and fast.

答案2

得分: 1

在MySQL中,通配符字符是%而不是*

尝试AGAINST ('%rev%' IN BOOLEAN MODE)

有时,*充当一个修饰符,用来匹配'零个或多个前面的内容实例'。

请参阅https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

然而,上述的模式匹配适用于扩展正则表达式(例如在使用RLIKE时)。

对于布尔全文搜索,*是一个截断运算符。表达式末尾的*意味着rev必须出现在一个或多个单词的开头。这解释了你的结果中为什么缺少记录。

我怀疑第一个*被忽略了。

更多信息:https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

英文:

In MySQL, the wildcard character is % rather than *.

Try AGAINST ('%rev%' IN BOOLEAN MODE)

Sometimes, * acts as a modifier to match 'zero or more instances of the thing preceding it.'

See https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

However, the above Pattern Matching applies to the extended regular expressions (such as when using RLIKE).

For a Boolean Full-text Search, the * is a truncation operator. The * at the end of your expression means that rev must be present at the start of one or more words. This explains the missing record in your results.

I suspect the first * is ignored.

More info: https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

答案3

得分: 1

> 12.9.5 全文搜索限制
>
> 百分号('%')字符不是全文搜索的支持通配符字符。

链接:https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html

> 星号(
>
> 星号用作截断(或通配符)运算符。与其他运算符不同,它附加到要影响的单词之后。**单词与在星号(
)运算符之前的单词匹配。**
> [...]
>
> 通配符单词被视为必须出现在一个或多个单词的开头的前缀。[...]
>
> 以下示例演示了一些使用布尔全文搜索运算符的搜索字符串: [...]
>
> 'apple*'
>
> 查找包含诸如“apple”、“apples”、“applesauce”或“applet”等单词的行。

链接:https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

英文:

> 12.9.5 Full-Text Restrictions
>
> The '%' character is not a supported wildcard character for full-text searches.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html

> Asterisk *
>
> The asterisk serves as the truncation (or wildcard) operator. Unlike
> the other operators, it is appended to the word to be affected. Words
> match if they begin with the word preceding the * operator.

> [...]
>
> The wildcarded word is considered as a prefix that must be present at
> the start of one or more words.[...]
>
> The following examples demonstrate some search strings that use boolean full-text operators: [...]
>
>'apple*'
>
>Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

huangapple
  • 本文由 发表于 2023年6月12日 06:22:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定