如何在Langchain中加载一个文件夹中的Json文件?

huangapple go评论49阅读模式
英文:

How to load a folder of Json files in Langchain?

问题

我正在尝试在Langchain中加载一个JSON文件夹:

loader = DirectoryLoader(r'C:...') 
documents = loader.load()

但我收到了这样的错误消息:

ValueError: JSON模式与非结构化模式不匹配

有人能告诉我如何解决这个问题吗?

我尝试使用 glob='**/*.json',但它不起作用。Langchain网站上的文档也有限。

英文:

I am trying to load a folder of JSON files in Langchain as:

loader = DirectoryLoader(r'C:...')
documents = loader.load()

But I got such an error message:

> ValueError: Json schema does not match the Unstructured schema

Can anyone tell me how to solve this problem?

I tried using glob='**/*.json', but it is not working. The documentation on the Langchain website is limited as well.

答案1

得分: 7

如果您想要读取整个文件,可以使用 loader_cls 参数:

from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)

此外,您还可以使用带有模式参数的 JSONLoader

from langchain.document_loaders.json_loader import JSONLoader

DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs={'jq_schema': '.content'})

documents = loader.load()

print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)

您可以在这里查看有关 jq_schema 的更多信息:
https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10

有关 DirectoryLoader 的更多用法,请参考:
https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py

英文:

If you want to read the whole file, you can use loader_cls params:

from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)

also, you can use JSONLoader with schema params like:

from langchain.document_loaders.json_loader import JSONLoader

DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs = {'jq_schema':'.content'})

documents = loader.load()

print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)

jq_schema You can follow this:
https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10

more usage for DirectoryLoader: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py

答案2

得分: -1

你可以使用DirectoryLoader类来加载Langchain中的JSON文件夹。这个类接受文件夹路径作为输入,并返回文档对象列表。

import langchain

from langchain.docstore.document import Document
from langchain.document_loaders.fs import DirectoryLoader

folder_path = "/path/to/json/files"
directory_loader = DirectoryLoader(folder_path)
documents = directory_loader.load()

for document in documents:
    print(document.page_content)
英文:

You can use the DirectoryLoader class to load a folder of JSON files in Langchain. This class takes a path to the folder as input and returns a list of Document objects.

import langchain

from langchain.docstore.document import Document
from langchain.document_loaders.fs import DirectoryLoader

folder_path = "/path/to/json/files"
directory_loader = DirectoryLoader(folder_path)
documents = directory_loader.load()

for document in documents:
    print(document.page_content)

huangapple
  • 本文由 发表于 2023年5月17日 23:31:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273784.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定