2023年7月20日 20:18:22go评论183阅读模式

英文:

how to specify similarity threshold in langchain faiss retriever?

问题

我想要传递一个相似性阈值给检索器。到目前为止，我只能找到如何传递一个k值，但这不是我想要的。我怎样才能传递一个阈值呢？

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
def get_conversation_chain(vectorstore):
    llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
    qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=vectorstore.as_retriever(search_kwargs={'k': 2}), return_source_documents=True, verbose=True)
    return qa
loader = PyPDFLoader("sample.pdf")
# get pdf raw text
pages = loader.load_and_split()
faiss_index = FAISS.from_documents(list_of_documents, OpenAIEmbeddings())
# create conversation chain
chat_history = []
qa = get_conversation_chain(faiss_index)
query = "What is a sunflower?"
result = qa({"question": query, "chat_history": chat_history})

希望这个翻译对您有帮助。

英文:

I would like to pass to the retriever a similarity threshold. So far I could only figure out how to pass a k value but this was not what I wanted. How can I pass a threshold instead?

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
def get_conversation_chain(vectorstore):
    llm = ChatOpenAI(temperature=0, model_name=&#39;gpt-3.5-turbo&#39;)
    qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=vectorstore.as_retriever(search_kwargs={&#39;k&#39;: 2}), return_source_documents=True, verbose=True)
    return qa
loader = PyPDFLoader(&quot;sample.pdf&quot;)
# get pdf raw text
pages = loader.load_and_split()
faiss_index = FAISS.from_documents(list_of_documents, OpenAIEmbeddings())
# create conversation chain
chat_history = []
qa = get_conversation_chain(faiss_index)
query = &quot;What is a sunflower?&quot;
result = qa({&quot;question&quot;: query, &quot;chat_history&quot;: chat_history})

答案1

得分: 1

这是来自api文档的答案 search_kwargs={'score_threshold': 0.3}。

英文:

This was the answer search_kwargs={'score_threshold': 0.3}) from the api docs.

答案2

得分: 1

你可以使用以下内容作为VectorStoreRetriever，就像你说的那样，但要加上search_type参数。

retriever = dbFAISS.as_retriever(search_type="similarity_score_threshold", 
                                 search_kwargs={"score_threshold": .5, 
                                                "k": top_k})

英文:

You can use the following as a VectorStoreRetriever as you say but with the search_type parameter.

retriever = dbFAISS.as_retriever(search_type=&quot;similarity_score_threshold&quot;, 
                                 search_kwargs={&quot;score_threshold&quot;: .5, 
                                                &quot;k&quot;: top_k})

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Langchain Faiss检索器中指定相似度阈值？

问题

答案1

答案2

典型的Python金字塔模式

Sagemath表达式无法解决。

使用groupby填充缺失值的高效方法

Docker Compose with Django and Postgres Fails with "django.db.utils.OperationalError: could not connect to server:"

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。