英文:
Question and answer over multiple csv files in langchain
问题
我有一个包含多个 CSV 文件的文件夹,我正在尝试找出一种方法将它们全部加载到 langchain 中并在所有文件上提出问题。
目前我所拥有的代码如下。
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.csv_loader import CSVLoader
import magic
import os
import nltk
os.environ['OPENAI_API_KEY'] = '...'
loader = DirectoryLoader('../data/', glob='**/*.csv', loader_cls=CSVLoader)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Chroma.from_documents(texts, embeddings)
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)
query = "how many females are present?"
qa.run(query)
英文:
I've a folder with multiple csv files, I'm trying to figure out a way to load them all into langchain and ask questions over all of them.
Here's what I have so far.
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.csv_loader import CSVLoader
import magic
import os
import nltk
os.environ['OPENAI_API_KEY'] = '...'
loader = DirectoryLoader('../data/', glob='**/*.csv', loader_cls=CSVLoader)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])
docsearch = Chroma.from_documents(texts, embeddings)
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)
query = "how many females are present?"
qa.run(query)
答案1
得分: 1
你应该将它们全部加载到向量存储中,例如 Pinecone
或 Metal
。然后根据是否需要内存,使用 RetrievalQAChain
或 ConversationalRetrievalChain
。
英文:
You should load them all into a vectorstore such as Pinecone
or Metal
. Then use a RetrievalQAChain
or ConversationalRetrievalChain
depending on if you want memory or not.
答案2
得分: 1
以下是翻译好的部分:
"不确定您是想要将多个 CSV 文件集成到您的查询中,还是要在它们之间进行比较。如果您想要比较/查看多个 CSV 文件之间的差异,并使用类似的查询单个文件的方法,可以参考以下链接:https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html
agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)
agent.run("age 列中有多少行不同?")
英文:
Not sure whether you want to integrate multiple csv files for your query or compare among them. Here is the link if you want to compare/see the differences among multiple csv files using similar approach with querying one file. https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html
agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)
agent.run("how many rows in the age column are different?")
答案3
得分: 0
你也可以考虑使用开源的本地LLM,例如llama2,来实现此目的。尝试localGPT
。https://github.com/PromtEngineer/localGPT
英文:
You can also consider using open source local LLM like llama2 for this purpose. Try localGPT
. https://github.com/PromtEngineer/localGPT
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论