解析 Langchain 的 JSON 文件时出现问题

huangapple go评论96阅读模式
英文:

issue parsing json file with Langchain

问题

需要一些帮助。

我有一个文件中包含以下 JSON 内容,并想要使用 langchain.js 和 gpt 来解析、存储和回答类似以下的问题:

例如:

"find me jobs with 2 year experience" ==> 应该返回一个列表

"I have knowledge in javascript find me jobs" ==> 应该返回 jobs 对象

我使用了 langchain JSON 加载器,我看到文件被解析了,但它显示找到了 13 个文档。文件中只有 3 个文档。JSON 结构是否不正确?

以下是我的解析代码片段:

const loader = new DirectoryLoader(docPath, {
  ".json": (path) => new JSONLoader(path),
});

const docs = await loader.load();
console.log(docs);
console.log(docs.length);

以下是我的输入数据:

[
  {
    "jobid":"job1",
    "title":"software engineer",
    "skills":"java,javascript",
    "description":"this job requires a associate degrees in CS and 2 years experience"
  },
   {
    "jobid":"job2",
    "skills":"math, accounting, spreadsheet",
    "description":"this job requires a degrees in accounting and 2 years experience"
  },
   {
    "jobid":"job3",
    "title":"programmer",
    "skills":"java,javascript,cloud computing",
    "description":"this job requires a ,master degrees in CS and 3 years experience"
  }
  
]

输出结果:

[
  Document {
    pageContent: 'job1',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 1
    }
  },
  Document {
    pageContent: 'software engineer',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 2
    }
  },
  Document {
    pageContent: 'java,javascript',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 3
    }
  },
  Document {
    pageContent: 'this job requires a associate degrees in CS and 2 years experience',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 4
    }
  },
  Document {
    pageContent: 'job2',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 5
    }
  },
  ...
]

(翻译完毕,不包括代码部分)

英文:

Need some help.

I have the following json content in a file and would like to use langchain.js and gpt to parse , store and answer question such as

for example:

"find me jobs with 2 year experience" ==> should return a list

"I have knowledge in javascript find me jobs" ==> should return the jobs pbject

I use langchain json loader and I see the file is parse but it say that it find 13 docs . There is only be 3 docs in file . Is the json structure not correct?

Here is snippet of my parse code

const loader = new DirectoryLoader(docPath, {
  ".json": (path) => new JSONLoader(path),
});

const docs = await loader.load();
console.log(docs);
console.log(docs.length);

Here is my input data

[
  {
    "jobid":"job1",
    "title":"software engineer"
    "skills":"java,javascript",
    "description":"this job requires a associate degrees in CS and 2 years experience"
  },
   {
    "jobid":"job2",
    "skills":"math, accounting, spreadsheet",
    "description":"this job requires a degrees in accounting and 2 years experience"
  },
   {
    "jobid":"job3",
    "title":"programmer"
    "skills":"java,javascript,cloud computing",
    "description":"this job requires a ,master degrees in CS and 3 years experience"
  }
  
]

OUTPUT
[
  Document {
    pageContent: 'job1',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 1
    }
  },
  Document {
    pageContent: 'software engineer',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 2
    }
  },
  Document {
    pageContent: 'java,javascript',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 3
    }
  },
  Document {
    pageContent: 'this job requires a associate degrees in CS and 2 years experience',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 4
    }
  },
  Document {
    pageContent: 'job2',
    metadata: {
      source: 'langchain-document-loaders-in-node-js/documents/jobs.json',
      line: 5
    }
  },

...

答案1

得分: 1

你的JSON包含一个JavaScript数组,其中有三个JavaScript对象。其中两个对象有四个属性,一个对象有三个属性。所有属性的值都是文本字符串。看起来你的解析器将每个属性都提取到其中的一个文档中。

你需要找到一种方法来告诉你的解析器,每个JavaScript对象都是一个文档。

英文:

Your JSON contains a Javascript array of three Javascript objects. Two of them have four properties, and one has three. All the properties have text strings for values. It looks like your parser pulls each property into one of its Documents.

You need to find a way to tell your parser that each Javascript object is one of its Documents.

huangapple
  • 本文由 发表于 2023年6月19日 01:14:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76501723.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定