问题

我们正在使用Chroma将记录以向量形式存储。在搜索查询时，返回的文档未能提供准确的结果。

c1 = Chroma('langchain', embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)*

如何获得准确的结果解决方案是什么？

英文:

We are using Chroma for storing the records in vector form. When searching the query, the return documents do not give accurate results.

c1 = Chroma(&#39;langchain&#39;, embedding, persist_directory)
qa = ChatVectorDBChain(vectorstore=c1, combine_docs_chain=doc_chain, question_generator=question_generator,top_k_docs_for_context=12, return_source_documents=True)*

What is the solution to get accurate results?

答案1

得分: 1

依赖于您的分块大小以及您如何准备知识库。句子应该被适当地分割，这样当您使用Chroma创建您的vectorDB并进行语义搜索时，就会容易捕捉相似性。此外，尽量减少返回的文档数k，以获取数据的最有用部分，而不是太多！

希望您会发现这些信息有用，

祝好运。

英文:

It depends on your chunks size and how you've prepared the knowledge base.
Sentences should be splitted properly so that when you make you vectorDB using Chroma and do semantic search it will be easy to catch the similarity. In addition, try to reduce the number of k ( returned docs ) to get the most useful part of your data not too much of them!

Hope you find this useful,

Good luck.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Langchain相似性搜索问题

问题

答案1

Langchain：自定义输出解析器在与ConversationChain一起使用时无法正常工作。

如何使用Chrome插件/扩展程序将链接注入到ChatGPT左侧菜单栏。

C++：输出原始输入而不是修改后的输入。

使用CUDA Thrust进行向量的替换/合并操作

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论