2023年7月12日 23:43:12go评论102阅读模式

英文:

OpenAI API, ChatCompletion and Completion give totally different answers with same parameters. Why?

问题

我正在探索在gpt3.5-turbo上使用不同提示的用法。

在研究“ChatCompletion”和“Completion”之间的差异时，一些参考资料说它们应该差不多，例如：https://platform.openai.com/docs/guides/gpt/chat-completions-vs-completions

其他来源说，正如预期的那样，ChatCompletion对于聊天机器人更有用，因为你有“角色”（系统、用户和助手），因此可以协调诸如少量示例和/或记忆先前聊天消息等事物。而Completion更适用于摘要或文本生成。

但差异似乎要大得多。我找不到有人解释底层发生了什么的参考资料。

以下实验给出了完全不同的结果，即使使用相同模型和相同参数。

使用ChatCompletion

import os
import openai
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = ...
openai.api_key = ...

chat_response = openai.ChatCompletion.create(
  engine="my_model", # gpt-35-turbo
  messages = [{"role":"user","content":"给我一些有趣的东西：\n"}],
  temperature=0,
  max_tokens=800,
  top_p=0.95,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(chat_response.choices[0]['message']['content'])

结果是关于一场战争的事实：

你知道历史上最短暂的战争是1896年英国和桑给巴尔之间的战争吗？它只持续了38分钟！

使用Completion

regular_response = openai.Completion.create(
  engine="my_model", # gpt-35-turbo
  prompt="给我一些有趣的东西：\n",
  temperature=0,
  max_tokens=800,
  top_p=0.95,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(regular_response['choices'][0]['text'])

结果是一段Python代码和关于它是什么的解释：

import random
import string

def random_string(length):
    return ''.join(random.choice(string.ascii_letters) for i in range(length))

print(random_string(10))

输出：

'JvJvJvJvJv'

这段代码使用string.ascii_letters和random.choice()生成长度为length的随机字符串。string.ascii_letters是一个包含所有ASCII字母（大写和小写）的字符串。random.choice()从序列中返回一个随机元素。for循环生成length数量的随机字母，join()将它们连接成一个字符串。结果是一个长度为length的随机字符串。这可以用于生成随机密码或其他唯一标识符。

注意事项

我使用相同的参数（temperature、top_p等）。唯一的区别是ChatCompletion/Completion API。
模型在两种情况下都相同，即gpt-35-turbo。
我保持温度较低，以便获得更一致的结果。
其他提示也会给出完全不同的答案，就像如果我尝试类似“什么是歌曲的定义？”的东西一样。

问题

为什么会发生这种情况？
鉴于它们使用相同的模型，相同的提示不应该给出类似的结果吗？
是否有OpenAI解释底层操作的参考资料？

英文:

I'm exploring the usage of different prompts on gpt3.5-turbo.

Investigating over the differences between "ChatCompletion" and "Completion", some references say that they should be more or less the same, for example: https://platform.openai.com/docs/guides/gpt/chat-completions-vs-completions

Other sources say, as expected, that ChatCompletion is more useful for chatbots, since you have "roles" (system, user and assistant), so that you can orchestrate things like few-shot examples and/or memory of previous chat messages. While Completion is more useful for summarization, or text generation.

But the difference seems to be much bigger. I can't find references where they explain what is happening under the hood.

The following experiment gives me totally diferent results, even when using the same model with the same parameters.

With ChatCompletion

import os
import openai
openai.api_type = &quot;azure&quot;
openai.api_version = &quot;2023-03-15-preview&quot;
openai.api_base = ...
openai.api_key = ...

chat_response = openai.ChatCompletion.create(
  engine=&quot;my_model&quot;, # gpt-35-turbo
  messages = [{&quot;role&quot;:&quot;user&quot;,&quot;content&quot;:&quot;Give me something intresting:\n&quot;}],
  temperature=0,
  max_tokens=800,
  top_p=0.95,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(chat_response.choices[0][&#39;message&#39;][&#39;content&#39;])

Result is a fact about a war:

Did you know that the shortest war in history was between Britain and Zanzibar in 1896? It lasted only 38 minutes!

With Completion

regular_response = openai.Completion.create(
  engine=&quot;my_model&quot;, # gpt-35-turbo
  prompt=&quot;Give me something intresting:\n&quot;,
  temperature=0,
  max_tokens=800,
  top_p=0.95,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(regular_response[&#39;choices&#39;][0][&#39;text&#39;])

Result is a python code and some explanation of what it does:

    ```
    import random
    import string
    
    def random_string(length):
        return &#39;&#39;.join(random.choice(string.ascii_letters) for i in range(length))
    
    print(random_string(10))
    ```
    Output:
    ```
    &#39;JvJvJvJvJv&#39;
    ```
    This code generates a random string of length `length` using `string.ascii_letters` and `random.choice()`. `string.ascii_letters` is a string containing all ASCII letters (uppercase and lowercase). `random.choice()` returns a random element from a sequence. The `for` loop generates `length` number of random letters and `join()` concatenates them into a single string. The result is a random string of length `length`. This can be useful for generating random passwords or other unique identifiers.&lt;|im_end|&gt;

Notes

I'm using the same parameters (temperature, top_p, etc). The only difference is the ChatCompletion/Completion api.
The model is the same in both cases, gpt-35-turbo.
I'm keeping the temperature low so I can get more consistent results.
Other prompts also give totally different answers, like if I try something like "What is the definition of song?"

The Question

Why is this happening?
Shouldn't same prompts give similar results given that they are using the same model?
Is there any reference material where OpenAI explains what it is doing under the hood?

答案1

得分: 1

我实际上是在回顾一些旧笔记时偶然找到了答案。

这都与隐藏标签有关，或者如我现在发现的那样，与聊天标记语言（ChatML）有关：https://github.com/openai/openai-python/blob/main/chatml.md

使用Completion API的这个提示现在几乎返回与ChatCompletion几乎相同的答案：

prompt = &quot;&quot;&quot;&lt;|im_start|&gt;system
&lt;|im_end|&gt;
&lt;|im_start|&gt;user
给我一些有趣的东西：
&lt;|im_end|&gt;
&lt;|im_start|&gt;assistant
&quot;&quot;&quot;

regular_response = openai.Completion.create(
  engine=&quot;my_model&quot;, # gpt-35-turbo
  prompt=prompt,
  temperature=0,
  max_tokens=800,
  top_p=0.95,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(regular_response[&#39;choices&#39;][0][&#39;text&#39;])

现在的结果是关于一场战争的同样事实（带有结束标记）：

你知道历史上最短暂的战争发生在1896年英国和桑给巴尔之间吗？这场战争只持续了38分钟，英国获胜了。&lt;|im_end|&gt;

看起来ChatCompletion API所做的一切就是在您的提示之间添加这些标记。

英文:

I actually found the answer by chance reviewing some old notebooks.

It's all on the hidden tags, or as I found out now, the Chat Markup Language (ChatML): https://github.com/openai/openai-python/blob/main/chatml.md

This prompt with the Completion api now returns almost the same answer as the ChatCompletion:

prompt = &quot;&quot;&quot;&lt;|im_start|&gt;system
&lt;|im_end|&gt;
&lt;|im_start|&gt;user
Give me something intresting:
&lt;|im_end|&gt;
&lt;|im_start|&gt;assistant
&quot;&quot;&quot;

regular_response = openai.Completion.create(
  engine=&quot;my_model&quot;, # gpt-35-turbo
  prompt=prompt,
  temperature=0,
  max_tokens=800,
  top_p=0.95,
  frequency_penalty=0,
  presence_penalty=0,
  stop=None)

print(regular_response[&#39;choices&#39;][0][&#39;text&#39;])

Result now is the same fact about a war (with the ending tag):

Did you know that the shortest war in history was between Britain and Zanzibar in 1896? The war lasted only 38 minutes, with the British emerging victorious.&lt;|im_end|&gt;

It seems that all that the ChatCompletion api is doing is adding those tags in between your prompts.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

OpenAI API, ChatCompletion and Completion give totally different answers with same parameters. Why?

问题

使用ChatCompletion

使用Completion

注意事项

问题

With ChatCompletion

With Completion

Notes

The Question

答案1

如何暂时隐藏标签文本，以便它不会覆盖整个tkinter窗口

RSI计算使用Python ta库为什么会根据起始位置而变化？

检查3个不同数据框中的3列，并创建一个新列。

使用`pack()`方法在tkinter中使元素居中于LabelFrame中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论