英文:
how to create a langchain doc from an str
问题
我在langchain官方网站的文档中搜索了很多,但没有找到如何从Python中的str变量创建langchain文档的方法,所以我在他们的GitHub代码中搜索到了以下内容:
doc=Document(
page_content="text",
metadata={"source": "local"}
)
PS:我添加了metadata属性。
然后我尝试使用这个doc与我的chain:
内存和链:
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
llm, chain_type="stuff", memory=memory, prompt=prompt
)
调用方法:
chain({"input_documents": doc, "human_input": query})
提示模板:
template = """You are a senior financial analyst analyzing the below document and having a conversation with a human.
{context}
{chat_history}
Human: {human_input}
senior financial analyst:"""
prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
但是我得到了以下错误:
AttributeError: 'tuple' object has no attribute 'page_content'
当我在将其与chain一起使用之前尝试检查Document对象的类型和page content时,我得到了这个结果:
print(type(doc))
<class 'langchain.schema.Document'>
print(doc.page_content)
"text"
希望这些信息对你有帮助。
英文:
I've searched all over langchain documentation on their official website but I didn't find how to create a langchain doc from a str variable in python so I searched in their GitHub code and I found this :
doc=Document(
page_content="text",
metadata={"source": "local"}
)
PS: I added the metadata attribute<br>
then I tried using that doc with my chain:<br>
Memory and Chain:
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
llm, chain_type="stuff", memory=memory, prompt=prompt
)
the call method:
chain({"input_documents": doc, "human_input": query})
prompt template:
template = """You are a senior financial analyst analyzing the below document and having a conversation with a human.
{context}
{chat_history}
Human: {human_input}
senior financial analyst:"""
prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
but I am getting the following error:
AttributeError: 'tuple' object has no attribute 'page_content'
when I tried to check the type and the page content of the Document object before using it with the chain I got this
print(type(doc))
<class 'langchain.schema.Document'>
print(doc.page_content)
"text"
答案1
得分: 15
这对我有效:
from langchain.docstore.document import Document
doc = Document(page_content="text", metadata={"source": "local"})
英文:
This worked for me:
from langchain.docstore.document import Document
doc = Document(page_content="text", metadata={"source": "local"})
答案2
得分: 0
这是我能想出的最好的方式。
def str_to_doc(text, name):
folder_name = 'docs'
if not os.path.exists(folder_name):
os.makedirs(folder_name)
file_name = name + '.txt'
path = os.path.join(folder_name, file_name)
with open(path, "w") as file:
file.write(text)
loader = TextLoader(path)
return loader.load()
英文:
this is the best that I could come with
def str_to_doc(text,name):
folder_name = 'docs'
if not os.path.exists(folder_name):
os.makedirs(folder_name)
file_name = name+'.txt'
path = os.path.join(folder_name, file_name)
with open(path, "w") as file:
file.write(text)
loader = TextLoader(path)
return loader.load()
答案3
得分: 0
从下面的代码片段尝试:
from langchain.schema.document import Document
doc = Document(page_content="text", metadata={"source": "local"})
英文:
Try the below code snippet,
from langchain.schema.document import Document
doc = Document(page_content="text", metadata={"source": "local"})
答案4
得分: 0
首先,提供一些背景信息。根据我迄今所学,一个文档是一个Document
对象的列表。如果你运行type(doc[0])
,你会得到langchain.schema.document.Document
。这个Document
对象是一个包含两个键的字典:一个是page_content:
,它接受字符串值,另一个键是metadata:
,它只接受字典。{page_content: str, metadata: dict}
。默认情况下(不要引用我说的话:这经过了很多试验和错误,正如你提到的,没有文档可供参考),一个"空"的Document
包含这两个提到的键,以及它的metadata:
中只有一个键的单个字典:{source:}
,它只接受字符串。你可以通过创建一个Document
对象的列表来创建一个包含多个"页面"的文档对象,如下所示:
首先,你必须有一个字符串文本列表:下面的text_list
,以及一个元数据字典列表:下面的metadata_list
。你必须确保这两个列表的长度相同。
from langchain.docstore.document import Document
document = []
for item in range(len(text_string)):
page = Document(page_content=doc_text_splits[item],
metadata=metadata_string[item])
doc.append(page)
另外,你还可以使用LangChain中的任何拆分器来创建Document
对象:
from langchain.text_splitter import CharacterTextSplitter
doc_creator = CharacterTextSplitter(parameters)
document = doc_creator.create_documents(texts=text_list, metadatas=metadata_list)
英文:
First, some context. From what I've learned so far, a Document is a list of Document
objects. If you run type(doc[0])
you get langchain.schema.document.Document
. This Document
object is a dictionary made of two keys: one is page_content:
which accepts string values, and the second key is metadata:
which only accepts dictionaries. {page_content: str, metadata: dict}
. By default (don't quote me on this: it's been lots of trial and error and, as you mentioned, there is no documentation), an "empty" Document
contains the two mentioned keys, and a single dictionary in its metadata:
with one key: {source:}
that only accepts strings. You can create a multiple "page" Document object by creating a list
of Document
objects like so:
First, you must have a list of string texts: text_list
below, and a list of dictionaries for the metadata: text_list
below. You must ensure both lists are the same length.
from langchain.docstore.document import Document
document = []
for item in range(len(text_string)):
page = Document(page_content=doc_text_splits[item],
metadata = metadata_string[item])
doc.append(page)
Additionally, you can also create Document
object using any splitter from LangChain:
from langchain.text_splitter import CharacterTextSplitter
doc_creator = CharacterTextSplitter(parameters)
document = doc_creator.create_documents(texts = text_list, metadatas = metadata_list)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论