英文:
Train LLM on internal docs
问题
我有关于请假等方面的公司内部文件。我想知道是否有一种方法或服务,我可以上传这些文件,然后拥有一个类似于ChatGPT的人工智能来回答与这些文件相关的问题?我不介意这是否是付费服务。有什么想法吗?
英文:
I have my internal company documentations regarding leaves and such. I was wondering if there is a way or a service where I can upload these docs, and I have a ChatGPT like AI which answers questions related to these docs? I don't mind if this is a paid service. Any ideas?
答案1
得分: 1
Sounds like you're looking for something like OSSChat
There are two ways to go about creating a ChatGPT like thing for your own internal docs: 1) fine-tuning an LLM, or 2) using a vector database + some LLM. I actually just recently made a multi document Q/A app using LlamaIndex, LangChain, and Milvus. Here's the Colab Notebook.
Basically what you can do is:
-
将您的文档进行向量化并存储在诸如Milvus之类的向量数据库中
-
为每个文档生成一些摘要或标题
-
将关键词存储在字典中,并使其值对应于您的向量存储条目
-
使用LlamaIndex连接关键词和向量存储索引
-
使用LlamaIndex进行可分解的查询
从高层次的角度来看,这应该就是所需的。
英文:
Sounds like you're looking for something like OSSChat
There are two ways to go about creating a ChatGPT like thing for your own internal docs: 1) fine-tuning an LLM, or 2) using a vector database + some LLM. I actually just recently made a multi document Q/A app using LlamaIndex, LangChain, and Milvus. Here's the Colab Notebook.
Basically what you can do is:
-
vectorize your documents and store them in a vector database like Milvus
-
generate some summaries or titles for each of your docs
-
store the keywords in a dict and make the values correspond to your vector store entries
-
use LlamaIndex to hook up the keyword and vector store indices
-
use LlamaIndex to make decomposable queries
that should pretty much be it from a high level POV
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论