Langchain相似性搜索问题

huangapple go评论60阅读模式
英文:

Langchain Similarity search issue

问题

我们正在使用Chroma将记录以向量形式存储。在搜索查询时,返回的文档未能提供准确的结果。

c1 = Chroma('langchain', embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)*

如何获得准确的结果解决方案是什么?

英文:

We are using Chroma for storing the records in vector form. When searching the query, the return documents do not give accurate results.

c1 = Chroma('langchain', embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)*

What is the solution to get accurate results?

答案1

得分: 1

依赖于您的分块大小以及您如何准备知识库。句子应该被适当地分割,这样当您使用Chroma创建您的vectorDB并进行语义搜索时,就会容易捕捉相似性。此外,尽量减少返回的文档数k,以获取数据的最有用部分,而不是太多!

希望您会发现这些信息有用,

祝好运。

英文:

It depends on your chunks size and how you've prepared the knowledge base.
Sentences should be splitted properly so that when you make you vectorDB using Chroma and do semantic search it will be easy to catch the similarity. In addition, try to reduce the number of k ( returned docs ) to get the most useful part of your data not too much of them!

Hope you find this useful,

Good luck.

huangapple
  • 本文由 发表于 2023年3月31日 19:26:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75898005.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定