Question and answer over multiple csv files in langchain

huangapple go评论82阅读模式
英文:

Question and answer over multiple csv files in langchain

问题

我有一个包含多个 CSV 文件的文件夹,我正在尝试找出一种方法将它们全部加载到 langchain 中并在所有文件上提出问题。

目前我所拥有的代码如下。

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.csv_loader import CSVLoader
import magic
import os
import nltk

os.environ['OPENAI_API_KEY'] = '...'

loader = DirectoryLoader('../data/', glob='**/*.csv', loader_cls=CSVLoader)

documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)

texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

docsearch = Chroma.from_documents(texts, embeddings)

qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)

query = "how many females are present?"
qa.run(query)

英文:

I've a folder with multiple csv files, I'm trying to figure out a way to load them all into langchain and ask questions over all of them.

Here's what I have so far.

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders.csv_loader import CSVLoader
import magic
import os
import nltk

os.environ['OPENAI_API_KEY'] = '...'

loader = DirectoryLoader('../data/', glob='**/*.csv', loader_cls=CSVLoader)

documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=400, chunk_overlap=0)

texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY'])

docsearch = Chroma.from_documents(texts, embeddings)

qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)

query = "how many females are present?"
qa.run(query)

答案1

得分: 1

你应该将它们全部加载到向量存储中,例如 PineconeMetal。然后根据是否需要内存,使用 RetrievalQAChainConversationalRetrievalChain

英文:

You should load them all into a vectorstore such as Pinecone or Metal. Then use a RetrievalQAChain or ConversationalRetrievalChain depending on if you want memory or not.

答案2

得分: 1

以下是翻译好的部分:

"不确定您是想要将多个 CSV 文件集成到您的查询中,还是要在它们之间进行比较。如果您想要比较/查看多个 CSV 文件之间的差异,并使用类似的查询单个文件的方法,可以参考以下链接:https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html

agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)
agent.run("age 列中有多少行不同?")

详细信息的截图

英文:

Not sure whether you want to integrate multiple csv files for your query or compare among them. Here is the link if you want to compare/see the differences among multiple csv files using similar approach with querying one file. https://python.langchain.com/en/latest/modules/agents/toolkits/examples/csv.html

agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)
agent.run("how many rows in the age column are different?")

screenshot for more details

答案3

得分: 0

你也可以考虑使用开源的本地LLM,例如llama2,来实现此目的。尝试localGPT。https://github.com/PromtEngineer/localGPT

英文:

You can also consider using open source local LLM like llama2 for this purpose. Try localGPT. https://github.com/PromtEngineer/localGPT

huangapple
  • 本文由 发表于 2023年4月13日 23:29:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76007259.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定