2023年6月27日 18:00:37go评论169阅读模式

英文:

Is there a way to reduce the number of tokens sent to chatgpt (as context)?

问题

我使用ChatGPT的API讨论书籍话题。为了让ChatGPT理解整个故事，我必须添加上下文。

这意味着所有用户的问题和ChatGPT的回复都与同一个请求一起发送。因此很快就会达到最大的支持令牌限制，并且使用费用也会迅速增加。

请向我展示一种简短的方法来减少发送的令牌数量，从而降低成本。

下面是我ChatGPT请求的示例。

英文:

I'm using chatgpt's API to discuss book topics. In order for chatgpt to understand the whole story I had to add context.

This means that all user questions and chatgpt replies are sent with the same request. Thus very quickly reaching the maximum support token limit. and usage fees also increase rapidly.

Please show me a short way to reduce the amount of tokens sent, thereby reducing costs.

Below is the example I chatgpt request

答案1

得分: 1

我有2种解决方案：

尝试学习 Langchain。它会缩短你输入的内容。但是，我不知道这是否真的能减少 ChatGPT 收取的令牌数量。
https://js.langchain.com/docs/modules/chains/other_chains/summarization
如果对话超过模型的令牌限制，需要以某种方式缩短。可以通过在对话历史中保留最后 n 个对话轮的滚动日志来实现这一目标。

英文:

I have 2 solutions

try to learn Langchain . it will shorten the content you put in. However, I don't know Is it really reducing the token that is charged by chatgpt?
https://js.langchain.com/docs/modules/chains/other_chains/summarization
If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way. This can be achieved by having a type of rolling log for conversational history, where only the last n amount of dialog turns are re-submitted.

答案2

得分: 1

简单而快速的方法是通过递归方式从消息数组中删除消息，以便发送的令牌数量（输入/提示令牌）加上您指定的max_tokens（最大完成令牌）不超过模型的令牌限制（gpt-3.5-turbo的限制为4096）。

const max_tokens = 1000; // 从OpenAI获取的最大响应令牌数
const modelTokenLimit = 4096; // gpt-3.5-turbo的令牌限制

// 确保来自OpenAI的提示令牌 + 最大完成令牌不超过模型的令牌限制
while (calcMessagesTokens(messages) > (modelTokenLimit - max_tokens)) {
      messages.splice(1, 1); // 删除系统消息之后的第一条消息
}

// 发送请求给OpenAI

英文:

Simple and fast method is implementing your own solution by somehow recursively removing messages in the message array so that the amount of tokens you send (input/prompt tokens) + the amount of tokens you specified as the max_tokens(max completion tokens) is within a model’s tokens limit (4096 for gpt-3.5-turbo)

const max_tokens = 1000; // max response tokens from OpenAI
const modelTokenLimit = 4096; // gpt-3.5-turbo tokens limit

// ensure prompt tokens + max completion tokens from OpenAI is within model’s tokens limit
while (calcMessagesTokens(messages) &gt; (modelTokenLimit - max_tokens)) {
      messages.splice(1, 1); // remove first message that comes after system message 
}

// send request to OpenAI

答案3

得分: 0

使用 Langchain！它具有许多功能，如数据加载器、向量数据库和缓存。在我看来，将数据存储为 PDF/文本文件，然后加载并将其分块为较小的片段。然后使用嵌入模型，您可以将其制作成一种检索问答模型，并在重复提问时通过缓存支持减少标记。

英文:

Use Langchain!, it has a lot of features like Dataloaders, Vector database, and caching. in my view, store the data in pdf/text file and load, chunk into smaller pieces. then use embedding model, you can make it as a retrieval QA kind of models and caching supports reduction of tokens when repeated questions asked

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有办法减少发送给ChatGPT的令牌数量（作为上下文）？

问题

答案1

答案2

答案3

使用以下命令通过pip安装正确版本的onxruntime来安装chromadb： “`

从langchain.chains导入ConversationalRetrievalChain不起作用。

Langchain相似性搜索问题

Flutter文本字段onSubmitted不起作用

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论