英文:
TypeScript LangChain add field to document metadata
问题
如何向Langchain的文档的元数据中添加字段?
例如,使用CharacterTextSplitter
会给出文档列表:
const splitter = new CharacterTextSplitter({
separator: " ",
chunkSize: 7,
chunkOverlap: 3,
});
splitter.createDocuments([text]);
文档将具有以下结构:
{
"pageContent": "blablabla",
"metadata": {
"name": "my-file.pdf",
"type": "application/pdf",
"size": 12012,
"lastModified": 1688375715518,
"loc": { "lines": { "from": 1, "to": 3 } }
}
}
我想向元数据中添加一个字段。
英文:
How should I add a field to the metadata of Langchain's Documents?
For example, using the CharacterTextSplitter
gives a list of Documents:
const splitter = new CharacterTextSplitter({
separator: " ",
chunkSize: 7,
chunkOverlap: 3,
});
splitter.createDocuments([text]);
A document will have the following structure:
{
"pageContent": "blablabla",
"metadata": {
"name": "my-file.pdf",
"type": "application/pdf",
"size": 12012,
"lastModified": 1688375715518,
"loc": { "lines": { "from": 1, "to": 3 } }
}
}
And I want to add a field to the metadata
答案1
得分: 0
for (var _doc of docs) {
_doc.metadata['doc_id'] = doc_id;
}
英文:
Ok... just loop over the docs I suppose:
for (var _doc of docs) {
_doc.metadata['doc_id'] = doc_id;
}
答案2
得分: 0
目前尚未在推荐的文本拆分器文档中显示如何执行此操作,但是createDocuments
的第二个参数可以接受一个对象数组,其中的属性将被分配到返回的文档数组中的每个元素的元数据中。
myMetaData = { url: "https://www.google.com" }
const documents = await splitter.createDocuments([text], [myMetaData],
{ chunkHeader, appendChunkOverlapHeader: true });
执行完后,documents
将包含一个数组,其中每个元素都是一个带有pageContent
和metaData
属性的对象。在metaData
下,还将出现上面myMetaData
的属性。pageContent
还将具有chunkHeader的文本前缀。
{
pageContent: <chunkHeader plus the chunk>,
metaData: <all properties of myMetaData plus loc (text line numbers of chunk)>
}
英文:
It isn't currently shown how to do this in the recommended text splitter documentation, but the 2nd argument of createDocuments can take an array of objects whose properties will be assigned into the metadata of every element of the returned documents array.
myMetaData = { url: "https://www.google.com" }
const documents = await splitter.createDocuments([text], [myMetaData],
{ chunkHeader, appendChunkOverlapHeader: true });
After this, documents
will contain an array, with each element being an object with pageContent
and metaData
properties. Under metaData
, the properties from myMetaData
above will also appear. pageContent
will also have the text of chunkHeader prepended.
{
pageContent: <chunkHeader plus the chunk>,
metadata: <all properties of myMetaData plus loc (text line numbers of chunk)>
}
答案3
得分: 0
你必须使用Document
类,并使用splitDocuments
方法。
示例:
const docOutput = await splitter.splitDocuments([
new Document({ pageContent: text }, { metadata: { someField: "someValue" } })
])
英文:
You have to use the Document
class, with the splitDocuments
method.
Example:
const docOutput = await splitter.splitDocuments([
new Document({pageContent: text}, metadata: {someField: "someValue"})
])
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论