2023年6月25日 23:09:42go评论71阅读模式

英文:

how to create a langchain doc from an str

问题

我在langchain官方网站的文档中搜索了很多，但没有找到如何从Python中的str变量创建langchain文档的方法，所以我在他们的GitHub代码中搜索到了以下内容：

doc=Document(
    page_content="text",
    metadata={"source": "local"}
)

PS：我添加了metadata属性。
然后我尝试使用这个doc与我的chain：
内存和链：

memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
    llm, chain_type="stuff", memory=memory, prompt=prompt
)

调用方法：

chain({"input_documents": doc, "human_input": query})

提示模板：

template = """You are a senior financial analyst analyzing the below document and having a conversation with a human.
{context}
{chat_history}
Human: {human_input}
senior financial analyst:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input", "context"], template=template
)

但是我得到了以下错误：

AttributeError: 'tuple' object has no attribute 'page_content'

当我在将其与chain一起使用之前尝试检查Document对象的类型和page content时，我得到了这个结果：

print(type(doc))
<class 'langchain.schema.Document'>
print(doc.page_content)
"text"

希望这些信息对你有帮助。

英文:

I've searched all over langchain documentation on their official website but I didn't find how to create a langchain doc from a str variable in python so I searched in their GitHub code and I found this :

  doc=Document(
                page_content=&quot;text&quot;,
                metadata={&quot;source&quot;: &quot;local&quot;}
            )

PS: I added the metadata attribute<br>
then I tried using that doc with my chain:<br>
Memory and Chain:

memory = ConversationBufferMemory(memory_key=&quot;chat_history&quot;, input_key=&quot;human_input&quot;)
chain = load_qa_chain(
    llm, chain_type=&quot;stuff&quot;, memory=memory, prompt=prompt
)

the call method:

  chain({&quot;input_documents&quot;: doc, &quot;human_input&quot;: query})

prompt template:

template = &quot;&quot;&quot;You are a senior financial analyst analyzing the below document and having a conversation with a human.
{context}
{chat_history}
Human: {human_input}
senior financial analyst:&quot;&quot;&quot;

prompt = PromptTemplate(
    input_variables=[&quot;chat_history&quot;, &quot;human_input&quot;, &quot;context&quot;], template=template
)

but I am getting the following error:

AttributeError: &#39;tuple&#39; object has no attribute &#39;page_content&#39;

when I tried to check the type and the page content of the Document object before using it with the chain I got this

print(type(doc))
&lt;class &#39;langchain.schema.Document&#39;&gt;
print(doc.page_content)
&quot;text&quot;

答案1

得分: 15

这对我有效：

from langchain.docstore.document import Document

doc = Document(page_content="text", metadata={"source": "local"})

英文:

This worked for me:

from langchain.docstore.document import Document

doc =  Document(page_content=&quot;text&quot;, metadata={&quot;source&quot;: &quot;local&quot;})

答案2

得分: 0

这是我能想出的最好的方式。

def str_to_doc(text, name):
   folder_name = 'docs'
   if not os.path.exists(folder_name):
       os.makedirs(folder_name)
   file_name = name + '.txt'
   path = os.path.join(folder_name, file_name)
   with open(path, "w") as file:
        file.write(text)
   loader = TextLoader(path)
   return loader.load()

英文:

this is the best that I could come with

def str_to_doc(text,name):
   folder_name = &#39;docs&#39;
   if not os.path.exists(folder_name):
       os.makedirs(folder_name)
   file_name = name+&#39;.txt&#39;
   path = os.path.join(folder_name, file_name)
   with open(path, &quot;w&quot;) as file:
        file.write(text)
   loader = TextLoader(path)
   return loader.load()

答案3

得分: 0

从下面的代码片段尝试：

from langchain.schema.document import Document
doc = Document(page_content="text", metadata={"source": "local"})

英文:

Try the below code snippet,

from langchain.schema.document import Document
doc = Document(page_content=&quot;text&quot;, metadata={&quot;source&quot;: &quot;local&quot;})

答案4

得分: 0

首先，提供一些背景信息。根据我迄今所学，一个文档是一个Document对象的列表。如果你运行type(doc[0])，你会得到langchain.schema.document.Document。这个Document对象是一个包含两个键的字典：一个是page_content:，它接受字符串值，另一个键是metadata:，它只接受字典。{page_content: str, metadata: dict}。默认情况下（不要引用我说的话：这经过了很多试验和错误，正如你提到的，没有文档可供参考），一个"空"的Document包含这两个提到的键，以及它的metadata:中只有一个键的单个字典：{source:}，它只接受字符串。你可以通过创建一个Document对象的列表来创建一个包含多个"页面"的文档对象，如下所示：

首先，你必须有一个字符串文本列表：下面的text_list，以及一个元数据字典列表：下面的metadata_list。你必须确保这两个列表的长度相同。

from langchain.docstore.document import Document

document = []

for item in range(len(text_string)):
    page = Document(page_content=doc_text_splits[item],
    metadata=metadata_string[item])
    doc.append(page)

另外，你还可以使用LangChain中的任何拆分器来创建Document对象：

from langchain.text_splitter import CharacterTextSplitter

doc_creator = CharacterTextSplitter(parameters)

document = doc_creator.create_documents(texts=text_list, metadatas=metadata_list)

英文:

First, some context. From what I've learned so far, a Document is a list of Document objects. If you run type(doc[0]) you get langchain.schema.document.Document. This Document object is a dictionary made of two keys: one is page_content: which accepts string values, and the second key is metadata: which only accepts dictionaries. {page_content: str, metadata: dict}. By default (don't quote me on this: it's been lots of trial and error and, as you mentioned, there is no documentation), an "empty" Document contains the two mentioned keys, and a single dictionary in its metadata: with one key: {source:} that only accepts strings. You can create a multiple "page" Document object by creating a list of Document objects like so:

First, you must have a list of string texts: text_list below, and a list of dictionaries for the metadata: text_list below. You must ensure both lists are the same length.

from langchain.docstore.document import Document

document =  []

for item in range(len(text_string)):
    page = Document(page_content=doc_text_splits[item],
    metadata = metadata_string[item])
    doc.append(page)

Additionally, you can also create Document object using any splitter from LangChain:

from langchain.text_splitter import CharacterTextSplitter

doc_creator = CharacterTextSplitter(parameters)

document = doc_creator.create_documents(texts = text_list, metadatas = metadata_list)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从一个字符串创建一个LangChain文档

问题

答案1

答案2

答案3

答案4

Google maps api works with manualy inserting json but not with inserting the same json from a python script

使用BeautifulSoup如何抓取元素的相关类别？

Python 打印格式化的十六进制字符串

算法用于比较排序列表的价格

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论