2023年5月30日 04:45:55go评论63阅读模式

英文:

Vector Search in AWS

问题

I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don't really do Vector Search

Documented in
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
the approach to vector search has exactly the same limitation as what
we observed with Solr: it will retrieve all documents that match the
search criteria (keyword query along with filters on document
attributes), and score all of them with the vector similarity of
choice (cosine distance, dot-product or L1/L2 norms). That is, vector
similarity will not be used during retrieval (first and expensive
step): it will instead be used during document scoring (second step).
Therefore, since you can’t know in advance, how many documents to
fetch to surface most semantically relevant, the mathematical idea of
vector search is not really applied.

Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn't fully understand how it work. If any one have any ideas or suggestions I would really appreciate.

Source: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455

英文:

> Documented in
> https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
> the approach to vector search has exactly the same limitation as what
> we observed with Solr: it will retrieve all documents that match the
> search criteria (keyword query along with filters on document
> attributes), and score all of them with the vector similarity of
> choice (cosine distance, dot-product or L1/L2 norms). That is, vector
> similarity will not be used during retrieval (first and expensive
> step): it will instead be used during document scoring (second step).
> Therefore, since you can’t know in advance, how many documents to
> fetch to surface most semantically relevant, the mathematical idea of
> vector search is not really applied.

Souce: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455

答案1

得分: 3

尝试一些更新的选项，如qdrant、weaviate、milvus。使用起来更容易，资源消耗较少，比opensearch。

英文:

try some newer ones like qdrant, weaviate, milvus . A lot easier to use and less resource hungry than opensearch.

答案2

得分: 1

Amazon OpenSearch有一个名为kNN的基于向量的搜索插件，并具有允许用户执行语义搜索的实验性功能。

参考链接：K-NN <br/>
AWS K-NN <br/>
语义搜索功能

英文:

Amazon OpenSearch has a vector based search plugin called as kNN and has experimental features to allow users to perform semantic search.

Reference: K-NN <br/>
AWS K-NN <br/>
Semantic Search feature

答案3

得分: 0

请查看 https://docs.datastax.com/en/astra-serverless/docs/vector-search/cql.html 您可以通过由Apache Cassandra支持的Astra Vector获得超快速和高召回率的ANN。以下是一个示例查询：

SELECT * FROM vsearch.products 
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 1;

英文:

Pls take a look at https://docs.datastax.com/en/astra-serverless/docs/vector-search/cql.html You can get super fast and high recall ANN out of Astra Vector powered by Apache Cassandra. Here is a sample query

SELECT * FROM vsearch.products 
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 1;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在AWS中进行向量搜索。

问题

答案1

答案2

答案3

Gitlab用户列表API使用Python和Amazon S3

Error parsing metadata commands (server). Check JSON structure and network connectivity (NAT instance or Proxy)' for uniqueId

信任策略在扮演角色时

公共IP未从Terraform代码中提取到实例中

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论