英文:
Is there a way to reduce the number of tokens sent to chatgpt (as context)?
问题
我使用ChatGPT的API讨论书籍话题。为了让ChatGPT理解整个故事,我必须添加上下文。
这意味着所有用户的问题和ChatGPT的回复都与同一个请求一起发送。因此很快就会达到最大的支持令牌限制,并且使用费用也会迅速增加。
请向我展示一种简短的方法来减少发送的令牌数量,从而降低成本。
下面是我ChatGPT请求的示例。
英文:
I'm using chatgpt's API to discuss book topics. In order for chatgpt to understand the whole story I had to add context.
This means that all user questions and chatgpt replies are sent with the same request. Thus very quickly reaching the maximum support token limit. and usage fees also increase rapidly.
Please show me a short way to reduce the amount of tokens sent, thereby reducing costs.
Below is the example I chatgpt request
答案1
得分: 1
我有2种解决方案:
- 尝试学习 Langchain。它会缩短你输入的内容。但是,我不知道这是否真的能减少 ChatGPT 收取的令牌数量。
https://js.langchain.com/docs/modules/chains/other_chains/summarization - 如果对话超过模型的令牌限制,需要以某种方式缩短。可以通过在对话历史中保留最后 n 个对话轮的滚动日志来实现这一目标。
英文:
I have 2 solutions
- try to learn Langchain . it will shorten the content you put in. However, I don't know Is it really reducing the token that is charged by chatgpt?
https://js.langchain.com/docs/modules/chains/other_chains/summarization - If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way. This can be achieved by having a type of rolling log for conversational history, where only the last n amount of dialog turns are re-submitted.
答案2
得分: 1
简单而快速的方法是通过递归方式从消息数组中删除消息,以便发送的令牌数量(输入/提示令牌)加上您指定的max_tokens
(最大完成令牌)不超过模型的令牌限制(gpt-3.5-turbo的限制为4096)。
const max_tokens = 1000; // 从OpenAI获取的最大响应令牌数
const modelTokenLimit = 4096; // gpt-3.5-turbo的令牌限制
// 确保来自OpenAI的提示令牌 + 最大完成令牌不超过模型的令牌限制
while (calcMessagesTokens(messages) > (modelTokenLimit - max_tokens)) {
messages.splice(1, 1); // 删除系统消息之后的第一条消息
}
// 发送请求给OpenAI
英文:
Simple and fast method is implementing your own solution by somehow recursively removing messages in the message array so that the amount of tokens you send (input/prompt tokens) + the amount of tokens you specified as the max_tokens
(max completion tokens) is within a model’s tokens limit (4096 for gpt-3.5-turbo)
const max_tokens = 1000; // max response tokens from OpenAI
const modelTokenLimit = 4096; // gpt-3.5-turbo tokens limit
// ensure prompt tokens + max completion tokens from OpenAI is within model’s tokens limit
while (calcMessagesTokens(messages) > (modelTokenLimit - max_tokens)) {
messages.splice(1, 1); // remove first message that comes after system message
}
// send request to OpenAI
答案3
得分: 0
使用 Langchain!它具有许多功能,如数据加载器、向量数据库和缓存。在我看来,将数据存储为 PDF/文本文件,然后加载并将其分块为较小的片段。然后使用嵌入模型,您可以将其制作成一种检索问答模型,并在重复提问时通过缓存支持减少标记。
英文:
Use Langchain!, it has a lot of features like Dataloaders, Vector database, and caching. in my view, store the data in pdf/text file and load, chunk into smaller pieces. then use embedding model, you can make it as a retrieval QA kind of models and caching supports reduction of tokens when repeated questions asked
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论