英文:
How to load a folder of Json files in Langchain?
问题
我正在尝试在Langchain中加载一个JSON文件夹:
loader = DirectoryLoader(r'C:...')
documents = loader.load()
但我收到了这样的错误消息:
ValueError: JSON模式与非结构化模式不匹配
有人能告诉我如何解决这个问题吗?
我尝试使用 glob='**/*.json'
,但它不起作用。Langchain网站上的文档也有限。
英文:
I am trying to load a folder of JSON files in Langchain as:
loader = DirectoryLoader(r'C:...')
documents = loader.load()
But I got such an error message:
> ValueError: Json schema does not match the Unstructured schema
Can anyone tell me how to solve this problem?
I tried using glob='**/*.json'
, but it is not working. The documentation on the Langchain website is limited as well.
答案1
得分: 7
如果您想要读取整个文件,可以使用 loader_cls
参数:
from langchain.document_loaders import DirectoryLoader, TextLoader
loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)
此外,您还可以使用带有模式参数的 JSONLoader
:
from langchain.document_loaders.json_loader import JSONLoader
DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"
loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs={'jq_schema': '.content'})
documents = loader.load()
print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)
您可以在这里查看有关 jq_schema
的更多信息:
https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10
有关 DirectoryLoader
的更多用法,请参考:
https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py
英文:
If you want to read the whole file, you can use loader_cls
params:
from langchain.document_loaders import DirectoryLoader, TextLoader
loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)
also, you can use JSONLoader
with schema params like:
from langchain.document_loaders.json_loader import JSONLoader
DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"
loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs = {'jq_schema':'.content'})
documents = loader.load()
print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)
jq_schema
You can follow this:
https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10
more usage for DirectoryLoader
: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py
答案2
得分: -1
你可以使用DirectoryLoader
类来加载Langchain中的JSON文件夹。这个类接受文件夹路径作为输入,并返回文档对象列表。
import langchain
from langchain.docstore.document import Document
from langchain.document_loaders.fs import DirectoryLoader
folder_path = "/path/to/json/files"
directory_loader = DirectoryLoader(folder_path)
documents = directory_loader.load()
for document in documents:
print(document.page_content)
英文:
You can use the DirectoryLoader
class to load a folder of JSON files in Langchain. This class takes a path to the folder as input and returns a list of Document objects.
import langchain
from langchain.docstore.document import Document
from langchain.document_loaders.fs import DirectoryLoader
folder_path = "/path/to/json/files"
directory_loader = DirectoryLoader(folder_path)
documents = directory_loader.load()
for document in documents:
print(document.page_content)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论