英文:
How to load and split from a list of File objects
问题
我正在创建一个JavaScript应用程序,其中有一个可拖放文件的区域,您可以从驱动器中拖放文件。
当文件被拖放时,我会获得一个File
对象数组。
现在我想使用langchain document loader
来加载这些文件,然后将它们拆分成块。这是我目前的函数:
import { TextLoader } from 'langchain/document_loaders/fs/text'
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'
import { Document } from 'langchain/document'
export async function IngestFiles (files) {
if (files.length < 1) return
console.log('files', files)
const splitter = new RecursiveCharacterTextSplitter(
{ chunkSize: 100, chunkOverlap: 10 }
)
let documents = []
files.forEach(async file => {
const loader = new TextLoader(file)
const doc = await loader.load()
const docOutput = await splitter.splitDocuments([
new Document({ pageContent: doc[0].pageContent })
])
documents = documents.concat(docOutput)
console.log('documents', documents)
})
console.log('result', documents)
return documents
}
我添加了一些console.log
行以便查看中间步骤:
正如您所看到的,我添加了两个小的txt文件,它们被正确加载并拆分成较小的Document
对象,但最终结果(最后的console.log
)为空。我已尝试了一切,我现在能想到的唯一问题可能与async/await
有关,但我看不到问题。
感谢任何帮助。
英文:
I'm creating a JavaScript app that has a drop area where you can drop files from your drive.
When the files are drop, I get an array of File
objects.
Now I want to use langchain document loader to load these files and then split them into chunks. This is the function I have so far:
import { TextLoader } from 'langchain/document_loaders/fs/text'
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'
import { Document } from 'langchain/document'
export async function IngestFiles (files) {
if (files.length < 1) return
console.log('files', files)
const splitter = new RecursiveCharacterTextSplitter(
{ chunkSize: 100, chunkOverlap: 10 }
)
let documents = []
files.forEach(async file => {
const loader = new TextLoader(file)
const doc = await loader.load()
const docOutput = await splitter.splitDocuments([
new Document({ pageContent: doc[0].pageContent })
])
documents = documents.concat(docOutput)
console.log('documents', documents)
})
console.log('result', documents)
return documents
}
I have added some console.log lines to be able to see the intermediate steps:
As you can see, I added two small txt files, they are properly loaded and split into smaller Document
objects, but then the final result (last copnsole.log) is empty. I've tried everything and all I can think now is that this is related to the async/await but I can't see the issue.
Any help is appreciated
答案1
得分: 2
我认为这篇帖子回答了你的问题: https://stackoverflow.com/a/70946414/9787476
作为帖子中建议的解决方案,不要使用 forEach
,而是使用 for-of
循环。
另外,有没有特定的原因要使用:
const docOutput = await splitter.splitDocuments([
new Document({ pageContent: doc[0].pageContent })
])
而不是简单地使用
const docOutput = await splitter.splitDocuments(doc)
英文:
I think this post answers your question: https://stackoverflow.com/a/70946414/9787476
As a suggested solution in the post, don't use forEach
, but use a for-of
loop.
Also is there a specific reason to use:
const docOutput = await splitter.splitDocuments([
new Document({ pageContent: doc[0].pageContent })
])
instead of simply
const docOutput = await splitter.splitDocuments(doc)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论