问题

我需要一种有效地执行类似字符串包含查询的方法，如：
# 在SQL中
LIKE '%some-string%'
# 在Mongo中
{ $regex: /some-string/ }

但是当数据集很大时，查询速度非常慢。例如，我在一个虚拟数据库中尝试过（带索引和不带索引 - 没有索引在Mongo上出奇地更快），并生成了1亿行数据（实际上更多）。如果我使用ElasticSearch似乎是合理的，但我想知道是否有一种数据库或一种方法可以优化这种用例？我已经询问过了，我确实需要包含而不是前缀匹配...


<details>
<summary>英文:</summary>
I need a way to effectively do a string contains query like:

In SQL

LIKE '%some-string%'

In mongo

{ $regex: /some-string/ }


But its very slow when the dataset size is big. Eg. I tried in a dummy DB (with and without an index - no index is surprisingly faster on mongo) and generate 100m rows (in reality theres more). Seems reasonable if I use ElasticSearch, but I am wondering if theres a DB or way I can structure my data to optimise this use case? I asked and I really need contains instead of a prefix match ...
</details>
# 答案1
**得分**: 1
PostgreSQL 提供了所谓的[三元索引][1]。这些索引可以有效地加速 SQL 中的 `col LIKE '%search%'` 断言。请注意，索引可以在所有服务器中加速 `col LIKE 'string%'`（没有前导通配符字符）。
MySQL / Mariadb 有[全文索引][2]，它使用不同的 SQL 语法。这个特性是逐词工作的，与 `LIKE` 不同，后者是逐字符的。Microsoft SQL Server 有[类似的特性][3]，但使用不同的语法。它也是逐词工作的。
因此，没有一种 SQL 标准的方式可以高效地实现这个功能，不同的数据库服务器采用不同的方法。
如果你还没有选择特定的数据库服务器，你应该确定一个全文搜索方案是否能满足你的需求。如果你需要从 LIKE 中获得良好的性能，PostgreSQL 的三元索引是一种方式。
  [1]: https://www.postgresql.org/docs/current/pgtrgm.html
  [2]: https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html
  [3]: https://learn.microsoft.com/en-us/sql/relational-databases/search/full-text-search?view=sql-server-ver16
<details>
<summary>英文:</summary>
Postgresql offers so-called [trigram indexes][1]. Those indexes can accelerate SQL `col LIKE &#39;%search%&#39;` predicates efficiently enough. Notice that indexing can, in all makes of server, speed up `col LIKE &#39;string%&#39;` (without the leading wildcard character).
MySQL / Mariadb have [FULLTEXT indexes][2] that work with a distinctive SQL syntax. That feature works word-by-word unlike, well, `LIKE` which works character-by-character. Microsoft SQL Server has a [similar feature][3] with different syntax. It also works word-by-word.
So, there&#39;s no SQL standard way to do this efficiently, and different makes of server do it differently. 
If you haven&#39;t yet chosen a particular make of server, you should figure out whether one of the full text schemes will serve your purpose. If you must get good performance from LIKE, 
postgresql&#39;s trigram indexing is the way to go.
  [1]: https://www.postgresql.org/docs/current/pgtrgm.html
  [2]: https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html
  [3]: https://learn.microsoft.com/en-us/sql/relational-databases/search/full-text-search?view=sql-server-ver16
</details>
# 答案2
**得分**: 0
没有通用的解决方法适用于所有数据库系统，我认为。正如另一个答案已经解释的那样，许多流行的数据库系统都有全文搜索扩展，尽管它们无法像Lucene/ElasticSearch等工具那样完成一切，但应该足以极大地加快您的用例速度。
让我从数据库内部的角度来解释这个问题。假设您的选择性很高，即只有很小的一部分元组实际上与您的条件匹配，那么通常情况下，您会希望有某种索引结构。对于这种类型的查询，您所需要的索引结构可能是一种基数树/字典树，但并非所有SQL数据库中都实现了这种标准数据结构。实际上，几乎所有SQL数据库中都实现的唯一数据结构是B-Tree。但B-Tree只能执行前缀查询，类似于 `LIKE 'test%'`。如果您的数据库没有此类索引，那么要执行 `LIKE '%test%'` 的唯一机会就是拥有一个非常快速的运行时系统，而传统的（开源）数据库系统都没有这样的系统...
<details>
<summary>英文:</summary>
There&#39;s no general solution to this that works for all database systems i think. As another answer already explains, there are fulltext search extensions to a lot of popular database systems that, while they&#39;re far from being able to do what stuff like Lucene/ElasticSearch can do, should be enough to massively speed up your use case.
Let me explain this from a database internals perspective. Let&#39;s say that your selectivity is high a.k.a only a very small percentage of your tuples actually match your condition then you would generally want to have some kind of index structure. The kind of index structure you would **need** for this kind of query is some kind of Radix-Tree/Trie but that&#39;s not a standard data structure implemented in all SQL databases. The only data structure that is actually implemented in almost all SQL databases is a B-Tree. But a B-Tree can only do Prefix queries something like `LIKE &#39;test%&#39;`. The only chance you have for `LIKE &#39;%test%&#39;` if your database doesn&#39;t have such indexes is having a very fast runtime system which none of the traditional (open source) database systems has...
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

数据库：有效实现字符串包含查询

问题

In SQL

In mongo

InfluxDB与GoLang

psql 警告：在使用连接 URI 时，忽略额外的命令行参数 x。

Connect to database running in local from application running inside docker container

如何在使用JpaRepository和Spring Boot的方法中找到一个没有参数的对象列表。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。