英文:
Langchain Similarity search issue
问题
我们正在使用Chroma将记录以向量形式存储。在搜索查询时,返回的文档未能提供准确的结果。
c1 = Chroma('langchain', embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)*
如何获得准确的结果解决方案是什么?
英文:
We are using Chroma for storing the records in vector form. When searching the query, the return documents do not give accurate results.
c1 = Chroma('langchain', embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)*
What is the solution to get accurate results?
答案1
得分: 1
依赖于您的分块大小以及您如何准备知识库。句子应该被适当地分割,这样当您使用Chroma创建您的vectorDB并进行语义搜索时,就会容易捕捉相似性。此外,尽量减少返回的文档数k,以获取数据的最有用部分,而不是太多!
希望您会发现这些信息有用,
祝好运。
英文:
It depends on your chunks size and how you've prepared the knowledge base.
Sentences should be splitted properly so that when you make you vectorDB using Chroma and do semantic search it will be easy to catch the similarity. In addition, try to reduce the number of k ( returned docs ) to get the most useful part of your data not too much of them!
Hope you find this useful,
Good luck.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论