如何强制 ConversationalRetrievalChain 从向量数据库中检索更多信息?(langchain)

huangapple go评论61阅读模式
英文:

How can I force ConversationalRetrievalChain to retrieve more information from the vector db? (langchain)

问题

Here's the translated portion of your text:

我将公司的组织结构图加载到了一个 CSV 文件中,然后加载到了 FAISS 中。每次我要求机器人列出一个人的所有职责时,它只搜索了最多 4 个结果,并基于这些结果创建了回应。

如何强制它从 FAISS 中获取更多数据?

以下是我的代码:

这是我的 CSV 的一部分:

示例结果当前看起来是这样的:

英文:

I loaded a company organization chart to a csv and loaded into FAISS. Each time I ask the bot to list out all the duties of an individual, it only searches up to 4 results and create the response based on that.

How do I force it to pull more data from FAISS?

Here's my code:

# Import os to set API key
import os
from apikey import apikey
os.environ['OPENAI_API_KEY'] = apikey

import streamlit as st
from streamlit_chat import message
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
import tempfile

uploaded_file = st.sidebar.file_uploader("upload", type="csv")

if uploaded_file :
   #use tempfile because CSVLoader only accepts a file_path
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        tmp_file.write(uploaded_file.getvalue())
        tmp_file_path = tmp_file.name

    loader = CSVLoader(file_path=tmp_file_path, encoding="utf-8", csv_args={
                'delimiter': ','})
    data = loader.load()

embeddings = OpenAIEmbeddings()

#we use FAISS as a vector db
vectorstore = FAISS.from_documents(data, embeddings)

chain = ConversationalRetrievalChain.from_llm(
    llm = ChatOpenAI(temperature=0.0,
                     model='gpt-3.5-turbo-16k',
                     max_tokens=10000),
    retriever=vectorstore.as_retriever(),verbose=True)

def conversational_chat(query):
    
    result = chain({"question": query, 
    "chat_history": st.session_state['history']})
    
    st.session_state['history'].append((query, result["answer"]))
    
    return result["answer"]

if 'history' not in st.session_state:
    st.session_state['history'] = []

if 'generated' not in st.session_state:
    st.session_state['generated'] = ["Hello ! Ask me anything about " + uploaded_file.name + " 🤗"]

if 'past' not in st.session_state:
    st.session_state['past'] = ["Hey ! 👋"]
    
#container for the chat history
response_container = st.container()
#container for the user's text input
container = st.container()

with container:
    with st.form(key='my_form', clear_on_submit=True):
        
        user_input = st.text_input("Query:", placeholder="Talk about your csv data here (:", key='input')
        submit_button = st.form_submit_button(label='Send')
        
    if submit_button and user_input:
        output = conversational_chat(user_input)
        
        st.session_state['past'].append(user_input)
        st.session_state['generated'].append(output)

if st.session_state['generated']:
    with response_container:
        for i in range(len(st.session_state['generated'])):
            message(st.session_state["past"][i], is_user=True, key=str(i) + '_user', avatar_style="big-smile")
            message(st.session_state["generated"][i], key=str(i), avatar_style="thumbs")

Here's part of my csv

AREA OF RESPONSIBILITY,Directly Responsible Individual,Backup,Oversight
,,,
"Run quarterly company meetings: schedule, set agenda for, and chair",Agatha,Agnes,Agatha
"Run bi-weekly company meetings: schedule, set agenda for, and chair",Agatha,Agnes,Agatha
"Run exec meetings: schedule, set agenda for, and chair",David,Urjita,Agatha
Manage hiring and HR,Agatha,Agnes,Agatha
Finalize company's 3-month goals,Agatha,David,Agatha
Maintain competitive landscape analysis,David,Agnes,Agnes
Maintain project roadmap ,David,Urjita,Agatha
Maintain project idea backlog,David,Urjita,Agatha
Maintain AORs (this sheet),David,Agatha,Agatha
Manage Polish business,Agnes,Agatha,Agnes
,,,
,,,
OPERATIONS MANAGEMENT,,,
"Run team operations meetings: schedule, set agenda for, and chair",Urjita,David?,David
Manage and coordinate the sharing of resources (people),Urjita,Agatha,David
Maintain task management systems (Kanban),Urjita,Agatha,David
Mantain operations calendar,Urjita,Agatha,Agatha
Field and sort incoming feature requests and bugs from team,Urjita,David,David
Manage individual projects,Varies,Varies,Varies
Maintain company wiki,David,Urjita,Agatha
"Manage Payroll, AR, and AP",Anne Marie,Agatha,Agatha
Manage company events,Anne Marie,Michael,Michael
Manage company holidays and vacation requests,Anne Marie,Agatha,Agatha
Manage the tracking of company email accounts,James,Agnes,Agnes
,,,
,,,
SALES AND MARKETING,,,
"Run sales meetings: schedule, set agenda for, and chair",Agatha,James,Agatha
"Set sales, marketing goals, and projects (adding projects to Project Roadmap)",Agatha,Agnes,Agatha
"Finalize numerical sales targets (for revenue, upgrades, etc.)",Agatha,Agnes,Agatha
Maintain sales and marketing portion of company calendar,James,Agatha,Agatha
"Develop and maintain sales collateral, messaging, and presentation structures",Agatha,Agnes,Agatha
"Schedule, manage, and execute B2B marketing/communications campaigns—print, online, email, social",James,Agatha,Agatha
"Manage CRM (Zoho): maintenance, functions, systems, usage by sales people",James,Agnes,Agnes
Track marketing and sales KPIs—and communicate internally,Agatha,James,Agatha
Pull Performance Reports and create GA reports for clients,James,Agnes,Agnes
Manage Insider Reports,James,Agatha,Agatha
Develop and maintain online sales conversion funnels,James,Agnes,Agnes
Maintain corporate website,Agatha,Agnes,Agatha
Distribute inbound sales inquiries,Marina,James,James
Conduct analysis after each sales cycle,Agatha,Agnes,Agatha
Manage Marketing Academy—both sales and product,James,Agatha,Agatha
Manage printing of sales collateral,Michael,James,Agatha

A sample result currently looks something like this:

>> Entering new  chain...
Prompt after formatting:
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: what does james do
Assistant: James is responsible for managing insider reports, managing social media, managing the tracking of company email accounts, and managing expo B2B communications.
Follow Up Input: list all things james does. be as comprehensive as possible
Standalone question:

> Finished chain.


> Entering new  chain...


> Entering new  chain...
Prompt after formatting:
System: Use the following pieces of context to answer the users question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
AREA OF RESPONSIBILITY: Manage client liability task tracking through Zoho Tasks: maintenance, functions, systems
Directly Responsible Individual: James
Backup: Urjita
Oversight: Agnes

AREA OF RESPONSIBILITY: Manage client communications (to-dos, reminders) regarding liabilities
Directly Responsible Individual: James
Backup: Agnes
Oversight: Agnes

AREA OF RESPONSIBILITY: Schedule, manage, and execute B2B marketing/communications campaigns—print, online, email, social
Directly Responsible Individual: James
Backup: Agatha
Oversight: Agatha

AREA OF RESPONSIBILITY: Manage the tracking of company email accounts
Directly Responsible Individual: James
Backup: Agnes
Oversight: Agnes
Human: Can you provide a comprehensive list of all the tasks that James is responsible for?

答案1

得分: 1

您的向量存储执行最近邻搜索,默认情况下检索的文档数量可能是5。我认为您可以通过将参数字典传递给底层的VectorDB 来设置它。对于ChromaDB,该参数被称为k,对于FAISS,我不清楚。

vectorstore.as_retriever(search_kwargs={"k": 10})
英文:

Your vectorstore performs a nearest neigbor search and the default setting for the number of retrieved documents is probably 5. I think you can set it by passing a dict of arguments to the underlying VectorDB. For ChromaDB the parameter is called k, for FAISS, I don't know.

vectorstore.as_retriever(search_kwargs={"k": 10})

huangapple
  • 本文由 发表于 2023年7月18日 09:07:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76708950.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定