2023年7月3日 17:19:15go评论75阅读模式

英文:

TypeScript LangChain add field to document metadata

问题

如何向Langchain的文档的元数据中添加字段？

例如，使用CharacterTextSplitter会给出文档列表：

const splitter = new CharacterTextSplitter({
  separator: " ",
  chunkSize: 7,
  chunkOverlap: 3,
});
splitter.createDocuments([text]);

文档将具有以下结构：

{
  "pageContent": "blablabla",
  "metadata": {
    "name": "my-file.pdf",
    "type": "application/pdf",
    "size": 12012,
    "lastModified": 1688375715518,
    "loc": { "lines": { "from": 1, "to": 3 } }
  }
}

我想向元数据中添加一个字段。

英文:

How should I add a field to the metadata of Langchain's Documents?

For example, using the CharacterTextSplitter gives a list of Documents:

const splitter = new CharacterTextSplitter({
  separator: &quot; &quot;,
  chunkSize: 7,
  chunkOverlap: 3,
});
splitter.createDocuments([text]);

A document will have the following structure:

{
  &quot;pageContent&quot;: &quot;blablabla&quot;,
  &quot;metadata&quot;: {
    &quot;name&quot;: &quot;my-file.pdf&quot;,
    &quot;type&quot;: &quot;application/pdf&quot;,
    &quot;size&quot;: 12012,
    &quot;lastModified&quot;: 1688375715518,
    &quot;loc&quot;: { &quot;lines&quot;: { &quot;from&quot;: 1, &quot;to&quot;: 3 } }
  }
}

And I want to add a field to the metadata

答案1

得分: 0

for (var _doc of docs) {
  _doc.metadata['doc_id'] = doc_id;
}

英文:

Ok... just loop over the docs I suppose:

for (var _doc of docs) {
  _doc.metadata[&#39;doc_id&#39;] = doc_id;
}

答案2

得分: 0

目前尚未在推荐的文本拆分器文档中显示如何执行此操作，但是createDocuments的第二个参数可以接受一个对象数组，其中的属性将被分配到返回的文档数组中的每个元素的元数据中。

myMetaData = { url: "https://www.google.com" }
const documents = await splitter.createDocuments([text], [myMetaData],
  { chunkHeader, appendChunkOverlapHeader: true });

执行完后，documents将包含一个数组，其中每个元素都是一个带有pageContent和metaData属性的对象。在metaData下，还将出现上面myMetaData的属性。pageContent还将具有chunkHeader的文本前缀。

{
  pageContent: <chunkHeader plus the chunk>,
  metaData: <all properties of myMetaData plus loc (text line numbers of chunk)>
}

英文:

It isn't currently shown how to do this in the recommended text splitter documentation, but the 2nd argument of createDocuments can take an array of objects whose properties will be assigned into the metadata of every element of the returned documents array.

myMetaData = { url: &quot;https://www.google.com&quot; }
const documents = await splitter.createDocuments([text], [myMetaData],
  { chunkHeader, appendChunkOverlapHeader: true });

After this, documents will contain an array, with each element being an object with pageContent and metaData properties. Under metaData, the properties from myMetaData above will also appear. pageContent will also have the text of chunkHeader prepended.

{
  pageContent: &lt;chunkHeader plus the chunk&gt;,
  metadata: &lt;all properties of myMetaData plus loc (text line numbers of chunk)&gt;
}

答案3

得分: 0

你必须使用Document类，并使用splitDocuments方法。

示例：

const docOutput = await splitter.splitDocuments([
  new Document({ pageContent: text }, { metadata: { someField: "someValue" } })
])

英文:

You have to use the Document class, with the splitDocuments method.

Example:

const docOutput = await splitter.splitDocuments([
new Document({pageContent: text}, metadata: {someField: &quot;someValue&quot;})
])

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

TypeScript LangChain向文档元数据添加字段

问题

答案1

答案2

答案3

在TypeScript中指定映射类型的函数参数类型。

如何使用GraphQL查询枚举数据类型

WebStorm 2023.1.2 doesn't recognize defineProps, defineEmits, computed etc. in a TypeScript project in Nuxt 3

并行抓取捕获组件状态，导致竞态条件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论