TypeScript LangChain向文档元数据添加字段

huangapple go评论59阅读模式
英文:

TypeScript LangChain add field to document metadata

问题

如何向Langchain的文档的元数据中添加字段?

例如,使用CharacterTextSplitter会给出文档列表:

const splitter = new CharacterTextSplitter({
  separator: " ",
  chunkSize: 7,
  chunkOverlap: 3,
});
splitter.createDocuments([text]);

文档将具有以下结构:

{
  "pageContent": "blablabla",
  "metadata": {
    "name": "my-file.pdf",
    "type": "application/pdf",
    "size": 12012,
    "lastModified": 1688375715518,
    "loc": { "lines": { "from": 1, "to": 3 } }
  }
}

我想向元数据中添加一个字段。

英文:

How should I add a field to the metadata of Langchain's Documents?

For example, using the CharacterTextSplitter gives a list of Documents:

const splitter = new CharacterTextSplitter({
  separator: " ",
  chunkSize: 7,
  chunkOverlap: 3,
});
splitter.createDocuments([text]);

A document will have the following structure:

{
  "pageContent": "blablabla",
  "metadata": {
    "name": "my-file.pdf",
    "type": "application/pdf",
    "size": 12012,
    "lastModified": 1688375715518,
    "loc": { "lines": { "from": 1, "to": 3 } }
  }
}

And I want to add a field to the metadata

答案1

得分: 0

for (var _doc of docs) {
  _doc.metadata['doc_id'] = doc_id;
}
英文:

Ok... just loop over the docs I suppose:

for (var _doc of docs) {
  _doc.metadata['doc_id'] = doc_id;
}

答案2

得分: 0

目前尚未在推荐的文本拆分器文档中显示如何执行此操作,但是createDocuments的第二个参数可以接受一个对象数组,其中的属性将被分配到返回的文档数组中的每个元素的元数据中。

myMetaData = { url: "https://www.google.com" }
const documents = await splitter.createDocuments([text], [myMetaData],
  { chunkHeader, appendChunkOverlapHeader: true });

执行完后,documents将包含一个数组,其中每个元素都是一个带有pageContentmetaData属性的对象。在metaData下,还将出现上面myMetaData的属性。pageContent还将具有chunkHeader的文本前缀。

{
  pageContent: <chunkHeader plus the chunk>,
  metaData: <all properties of myMetaData plus loc (text line numbers of chunk)>
}
英文:

It isn't currently shown how to do this in the recommended text splitter documentation, but the 2nd argument of createDocuments can take an array of objects whose properties will be assigned into the metadata of every element of the returned documents array.

myMetaData = { url: &quot;https://www.google.com&quot; }
const documents = await splitter.createDocuments([text], [myMetaData],
  { chunkHeader, appendChunkOverlapHeader: true });

After this, documents will contain an array, with each element being an object with pageContent and metaData properties. Under metaData, the properties from myMetaData above will also appear. pageContent will also have the text of chunkHeader prepended.

{
  pageContent: &lt;chunkHeader plus the chunk&gt;,
  metadata: &lt;all properties of myMetaData plus loc (text line numbers of chunk)&gt;
}

答案3

得分: 0

你必须使用Document类,并使用splitDocuments方法。

示例:

const docOutput = await splitter.splitDocuments([
  new Document({ pageContent: text }, { metadata: { someField: "someValue" } })
])
英文:

You have to use the Document class, with the splitDocuments method.

Example:

const docOutput = await splitter.splitDocuments([
new Document({pageContent: text}, metadata: {someField: &quot;someValue&quot;})
])

</details>



huangapple
  • 本文由 发表于 2023年7月3日 17:19:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603417.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定