2023年6月12日 02:41:01go评论68阅读模式

英文:

LangChain: Reduce size of tokens being passed to OpenAI

问题

I am using LangChain to create embeddings and then ask a question to those embeddings like so:

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(disallowed_special=())
db = DeepLake(
    dataset_path=deeplake_url,
    read_only=True,
    embedding_function=embeddings,
)
retriever: VectorStoreRetriever = db.as_retriever()
model = ChatOpenAI(model_name="gpt-3.5-turbo") 
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
result = qa({"question": question, "chat_history": chat_history})

But I am getting the following error:

File "/xxxxx/openai/api_requestor.py", line 763, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 13918 tokens. Please reduce the length of the messages.

The chat_history is empty and the question is quite small.

How can I reduce the size of tokens being passed to OpenAI?

I'm assuming the response from the embeddings is too large being passed to openai. It might be easy enough to just figure out how to truncate the data being sent to openai.

英文:

I am using LangChain to create embeddings and then ask a question to those embeddings like so:

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(disallowed_special=())
db = DeepLake(
    dataset_path=deeplake_url,
    read_only=True,
    embedding_function=embeddings,
)
retriever: VectorStoreRetriever = db.as_retriever()
model = ChatOpenAI(model_name=&quot;gpt-3.5-turbo&quot;) 
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
result = qa({&quot;question&quot;: question, &quot;chat_history&quot;: chat_history})

But I am getting the following error:

File &quot;/xxxxx/openai/api_requestor.py&quot;, line 763, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model&#39;s maximum context length is 4097 tokens. However, your messages resulted in 13918 tokens. Please reduce the length of the messages.

The chat_history is empty and the question is quite small.

How can I reduce the size of tokens being passed to OpenAI?

I'm assuming the response from the embeddings is too large being passed to openai. It might be easy enough to just figure out how to truncate the data being sent to openai.

答案1

得分: 3

概要

当您初始化ConversationalRetrievalChain对象时，请传入max_tokens_limit的值。

qa = ConversationalRetrievalChain.from_llm(
        model, retriever=retriever, max_tokens_limit=4000
    )

这将在向openai/您的llm提问时自动截断令牌。

更详细的解释

在ConversationalRetrievalChain的base.py中，有一个函数，在向deeplake/openai提问时会调用它：

    def _get_docs(self, question: str, inputs: Dict[str, Any]) -&gt; List[Document]:
        docs = self.retriever.get_relevant_documents(question)
        return self._reduce_tokens_below_limit(docs)

这个函数从deeplake向量数据库中读取数据，并将其添加为上载到openai的文档文本的上下文。

_reduce_tokens_below_limit函数从类实例变量max_tokens_limit中读取数据，以截断输入文档的大小。

英文:

Summary

When you initiate the ConversationalRetrievalChain object, pass in a max_tokens_limit amount.

qa = ConversationalRetrievalChain.from_llm(
        model, retriever=retriever, max_tokens_limit=4000
    )

This will automatically truncate the tokens when asking openai / your llm.

Longer explainer

In the base.py of ConversationalRetrievalChain there is a function that is called when asking your question to deeplake/openai:

    def _get_docs(self, question: str, inputs: Dict[str, Any]) -&gt; List[Document]:
        docs = self.retriever.get_relevant_documents(question)
        return self._reduce_tokens_below_limit(docs)

Which reads from the deeplake vector database, and adds that as context to your doc's text that you upload to openai.

The _reduce_tokens_below_limit reads from the class instance variable max_tokens_limit to truncate the size of the input docs.

答案2

得分: 0

max_tokens_limit只适用于模型创建的新令牌。

但是，模型中的令牌限制是输入和生成的所有令牌的总和。所以，如果你恰好通过减少新令牌的数量，可以清除模型设定的总限制。但是，如果你的输入令牌太多，即使将max_tokens_limit设置为0，仍然有可能超出模型的限制。

你的输入令牌 + max_tokens_limit <= 模型令牌限制。

每个人的方法都不同，取决于他们更喜欢优先考虑什么。例如，如果我正在使用一个512令牌的模型，我可能会将最大令牌输出设置为约200令牌，因此我会剪切输入令牌长度为312。

这完全取决于你的模型、任务类型和用例。

编辑：如果你直接使用分词器，这似乎不是情况，你可以为分词后的input_ids添加一个max_length限制。但我认为这在Langchain中不会发生 - 它由管道/链处理。

英文:

My two cents, the current explanation is not strictly true, it might just be working by accident.

max_tokens_limit applies specifically to the new tokens created by the model.

However, the limit of a model in tokens, is the sum of all tokens input and generated by the model. So if you happen to reduce the new tokens by enough you can clear the overall bar set by the model. However, it's feasible to still exceed the model's limit even if you set the max_tokens_limit = 0, if your input tokens are too great.

Your input tokens + max_tokens_limit <= model token limit.

Everyone will have a different approach, depending on which they prefer to prioritize. For example, if I'm using a 512 token model, I might aim for a max token output of around 200 token, so I will clip the input token length to 312.

This will completely depend on your model, task type, and use-case.

EDIT: if you use tokenizer directly, which doesn't seem to the be case, you can add a max_length limit to the tokenised input_ids. But I don't think this happens with Langchain - it's handled by the pipeline/chain.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

LangChain: 减小传递给OpenAI的标记大小

问题

答案1

概要

更详细的解释

Summary

Longer explainer

答案2

在LangChain中，如何将详细输出保存到变量？

如何在Langchain中流式传输Agent的响应？

OpenAI GPT-3 API: 如何使模型记住以前的对话？

如何强制 ConversationalRetrievalChain 从向量数据库中检索更多信息？（langchain）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论