Train LLM on internal docs

huangapple go评论42阅读模式
英文:

Train LLM on internal docs

问题

我有关于请假等方面的公司内部文件。我想知道是否有一种方法或服务,我可以上传这些文件,然后拥有一个类似于ChatGPT的人工智能来回答与这些文件相关的问题?我不介意这是否是付费服务。有什么想法吗?

英文:

I have my internal company documentations regarding leaves and such. I was wondering if there is a way or a service where I can upload these docs, and I have a ChatGPT like AI which answers questions related to these docs? I don't mind if this is a paid service. Any ideas?

答案1

得分: 1

Sounds like you're looking for something like OSSChat

There are two ways to go about creating a ChatGPT like thing for your own internal docs: 1) fine-tuning an LLM, or 2) using a vector database + some LLM. I actually just recently made a multi document Q/A app using LlamaIndex, LangChain, and Milvus. Here's the Colab Notebook.

Basically what you can do is:

  1. 将您的文档进行向量化并存储在诸如Milvus之类的向量数据库中

  2. 为每个文档生成一些摘要或标题

  3. 将关键词存储在字典中,并使其值对应于您的向量存储条目

  4. 使用LlamaIndex连接关键词和向量存储索引

  5. 使用LlamaIndex进行可分解的查询

从高层次的角度来看,这应该就是所需的。

英文:

Sounds like you're looking for something like OSSChat

There are two ways to go about creating a ChatGPT like thing for your own internal docs: 1) fine-tuning an LLM, or 2) using a vector database + some LLM. I actually just recently made a multi document Q/A app using LlamaIndex, LangChain, and Milvus. Here's the Colab Notebook.

Basically what you can do is:

  1. vectorize your documents and store them in a vector database like Milvus

  2. generate some summaries or titles for each of your docs

  3. store the keywords in a dict and make the values correspond to your vector store entries

  4. use LlamaIndex to hook up the keyword and vector store indices

  5. use LlamaIndex to make decomposable queries

that should pretty much be it from a high level POV

huangapple
  • 本文由 发表于 2023年6月8日 23:52:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76433659.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定