英文:
Vector Search in AWS
问题
I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don't really do Vector Search
Documented in
https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
the approach to vector search has exactly the same limitation as what
we observed with Solr: it will retrieve all documents that match the
search criteria (keyword query along with filters on document
attributes), and score all of them with the vector similarity of
choice (cosine distance, dot-product or L1/L2 norms). That is, vector
similarity will not be used during retrieval (first and expensive
step): it will instead be used during document scoring (second step).
Therefore, since you can’t know in advance, how many documents to
fetch to surface most semantically relevant, the mathematical idea of
vector search is not really applied.
Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn't fully understand how it work. If any one have any ideas or suggestions I would really appreciate.
Source: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455
英文:
I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don't really do Vector Search
> Documented in
> https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch
> the approach to vector search has exactly the same limitation as what
> we observed with Solr: it will retrieve all documents that match the
> search criteria (keyword query along with filters on document
> attributes), and score all of them with the vector similarity of
> choice (cosine distance, dot-product or L1/L2 norms). That is, vector
> similarity will not be used during retrieval (first and expensive
> step): it will instead be used during document scoring (second step).
> Therefore, since you can’t know in advance, how many documents to
> fetch to surface most semantically relevant, the mathematical idea of
> vector search is not really applied.
Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn't fully understand how it work. If any one have any ideas or suggestions I would really appreciate.
Souce: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455
答案1
得分: 3
尝试一些更新的选项,如qdrant、weaviate、milvus。使用起来更容易,资源消耗较少,比opensearch。
英文:
try some newer ones like qdrant, weaviate, milvus . A lot easier to use and less resource hungry than opensearch.
答案2
得分: 1
Amazon OpenSearch有一个名为kNN
的基于向量的搜索插件,并具有允许用户执行语义搜索的实验性功能。
参考链接:K-NN <br/>
AWS K-NN <br/>
语义搜索功能
英文:
Amazon OpenSearch has a vector based search plugin called as kNN
and has experimental features to allow users to perform semantic search.
Reference: K-NN <br/>
AWS K-NN <br/>
Semantic Search feature
答案3
得分: 0
请查看 https://docs.datastax.com/en/astra-serverless/docs/vector-search/cql.html 您可以通过由Apache Cassandra支持的Astra Vector获得超快速和高召回率的ANN。以下是一个示例查询:
SELECT * FROM vsearch.products
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 1;
英文:
Pls take a look at https://docs.datastax.com/en/astra-serverless/docs/vector-search/cql.html You can get super fast and high recall ANN out of Astra Vector powered by Apache Cassandra. Here is a sample query
SELECT * FROM vsearch.products
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 1;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论