英文:
Build a chatbot with custom data using Langchain
问题
我正在尝试理解GPT/langchain。我想要只使用我的数据,但我找不到一个基本的示例。
例如,我设想我的聊天对话如下:
用户:告诉我如何建造一座树屋
GPT:要建造一座树屋,你需要以下材料和工具......
我的数据存储在一个名为mydata.txt的文件中,内容如下:
要建造一座树屋,你需要以下工具:锤子、钉子和木材......
....
.....
请问能否展示一个如何实现这个简单示例的例子。
英文:
I am trying to understand GPT/langchain . I want to use my own data only but I am not able to find a basic example.
for example, I envision my chat to be something like this:
USER: show me way to build a tree house
GPT : To build a tree house you need the following materials and tools.....
MY owns data in a file mydata.txt with the following content
To build a tree house you need the following tool hammer , nails and materials wood...
....
.....
Can you please show a simple example of how this can be done ..
答案1
得分: 3
以下是您要的翻译:
摘要
您需要使用langchain中的Vector DB Text Generation工具,此工具将允许您使用自己的文档作为聊天机器人回答的上下文。我将在下面提供的示例略有不同于文档中的链,但我发现它效果更好,而且文档主要讨论从GitHub存储库获取文本,这似乎不适用于您的情况。
下面的代码是用Python编写的
加载导入项、LLM模型和文档拆分器
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
loader = TextLoader("") #将文件的路径和名称放在这里,如果它在代码文件的相同目录中,您只需使用目标文件名
documents = loader.load()
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) #更改要使用的模型,调整温度以查看哪一个提供更好的答案
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0) #您可以根据自己的文档设置每个文档块的大小
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings() #这将创建您文本的向量嵌入
docsearch = Chroma.from_documents(texts, embeddings)
创建提示模板
from langchain.chains import LLMChain
prompt_template = """Use the context below to write a 400 word blog post about the topic below:
Context: {context}
Topic: {topic}
Blog post:"""
#这是标准的提示模板,您可以更改并尝试它
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "topic"]
)
chain = LLMChain(llm=llm, prompt=PROMPT)
创建用于发布并运行的函数
def generate_blog_post(topic):
docs = search_index.similarity_search(topic, k=4)
#k基本上是每次搜索为LLM提供的上下文块数量,更多可能提供更多上下文,但可能会消耗更多的令牌,有时甚至会使模型混淆,请测试并注意
inputs = [{"context": doc.page_content, "topic": topic} for doc in docs]
print(chain.apply(inputs))
generate_blog_post("您的问题/主题")
英文:
Summary
You need to use the Vector DB Text Generation tool in langchain, this tool will allow you to use your own documents as context for the chatbot to use for its answers.The example i will give below is slightly different from the chain in the documentation but i found it works better, not to mention the documentation talks mostly about getting text from a github repo, which isnt your case i suppose.
code below is written in Python
load the imports, LLM model and the document splitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
loader = TextLoader("") #put the path and name of the file here, if its in the same directory of the code file you can just use the target file name
documents = loader.load()
llm = ChatOpenAI(model = "gpt-3.5-turbo", temperature=0) //change the model to the one you want to use, tweak the temperature to see which one gives better answers
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0) # you can set the size of each doc chunk from your own doc
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings() #this will create the vector embeddings of your text
docsearch = Chroma.from_documents(texts, embeddings)
create the prompt template
from langchain.chains import LLMChain
prompt_template = """Use the context below to write a 400 word blog post about the topic below:
Context: {context}
Topic: {topic}
Blog post:"""
#this is the standard prompt template, you can change and experiment with it
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "topic"]
)
chain = LLMChain(llm=llm, prompt=PROMPT)
create the function to make the post and run it
def generate_blog_post(topic):
docs = search_index.similarity_search(topic, k=4)
#k is basically how many chunks of context will be given to the LLM for each search, more could give more context, but it could cost more tokens or someties even confuse the model, test it and be aware
inputs = [{"context": doc.page_content, "topic": topic} for doc in docs]
print(chain.apply(inputs))
generate_blog_post("your question/subject")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论