在AWS中进行向量搜索。

huangapple go评论47阅读模式
英文:

Vector Search in AWS

问题

I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don't really do Vector Search

Documented in
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
the approach to vector search has exactly the same limitation as what
we observed with Solr: it will retrieve all documents that match the
search criteria (keyword query along with filters on document
attributes), and score all of them with the vector similarity of
choice (cosine distance, dot-product or L1/L2 norms). That is, vector
similarity will not be used during retrieval (first and expensive
step): it will instead be used during document scoring (second step).
Therefore, since you can’t know in advance, how many documents to
fetch to surface most semantically relevant, the mathematical idea of
vector search is not really applied.

Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn't fully understand how it work. If any one have any ideas or suggestions I would really appreciate.

Source: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455

英文:

I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don't really do Vector Search

> Documented in
> https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
> the approach to vector search has exactly the same limitation as what
> we observed with Solr: it will retrieve all documents that match the
> search criteria (keyword query along with filters on document
> attributes), and score all of them with the vector similarity of
> choice (cosine distance, dot-product or L1/L2 norms). That is, vector
> similarity will not be used during retrieval (first and expensive
> step): it will instead be used during document scoring (second step).
> Therefore, since you can’t know in advance, how many documents to
> fetch to surface most semantically relevant, the mathematical idea of
> vector search is not really applied.

Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn't fully understand how it work. If any one have any ideas or suggestions I would really appreciate.

Souce: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455

答案1

得分: 3

尝试一些更新的选项,如qdrant、weaviate、milvus。使用起来更容易,资源消耗较少,比opensearch。

英文:

try some newer ones like qdrant, weaviate, milvus . A lot easier to use and less resource hungry than opensearch.

答案2

得分: 1

Amazon OpenSearch有一个名为kNN的基于向量的搜索插件,并具有允许用户执行语义搜索的实验性功能。

参考链接:K-NN <br/>
AWS K-NN <br/>
语义搜索功能

英文:

Amazon OpenSearch has a vector based search plugin called as kNN and has experimental features to allow users to perform semantic search.

Reference: K-NN <br/>
AWS K-NN <br/>
Semantic Search feature

答案3

得分: 0

请查看 https://docs.datastax.com/en/astra-serverless/docs/vector-search/cql.html 您可以通过由Apache Cassandra支持的Astra Vector获得超快速和高召回率的ANN。以下是一个示例查询:

SELECT * FROM vsearch.products 
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 1;
英文:

Pls take a look at https://docs.datastax.com/en/astra-serverless/docs/vector-search/cql.html You can get super fast and high recall ANN out of Astra Vector powered by Apache Cassandra. Here is a sample query

SELECT * FROM vsearch.products 
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 1;

huangapple
  • 本文由 发表于 2023年5月30日 04:45:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360247.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定