英文:
OpenAI API, ChatCompletion and Completion give totally different answers with same parameters. Why?
问题
我正在探索在gpt3.5-turbo上使用不同提示的用法。
在研究“ChatCompletion”和“Completion”之间的差异时,一些参考资料说它们应该差不多,例如:https://platform.openai.com/docs/guides/gpt/chat-completions-vs-completions
其他来源说,正如预期的那样,ChatCompletion对于聊天机器人更有用,因为你有“角色”(系统、用户和助手),因此可以协调诸如少量示例和/或记忆先前聊天消息等事物。而Completion更适用于摘要或文本生成。
但差异似乎要大得多。我找不到有人解释底层发生了什么的参考资料。
以下实验给出了完全不同的结果,即使使用相同模型和相同参数。
使用ChatCompletion
import os
import openai
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = ...
openai.api_key = ...
chat_response = openai.ChatCompletion.create(
engine="my_model", # gpt-35-turbo
messages = [{"role":"user","content":"给我一些有趣的东西:\n"}],
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(chat_response.choices[0]['message']['content'])
结果是关于一场战争的事实:
你知道历史上最短暂的战争是1896年英国和桑给巴尔之间的战争吗?它只持续了38分钟!
使用Completion
regular_response = openai.Completion.create(
engine="my_model", # gpt-35-turbo
prompt="给我一些有趣的东西:\n",
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(regular_response['choices'][0]['text'])
结果是一段Python代码和关于它是什么的解释:
import random
import string
def random_string(length):
return ''.join(random.choice(string.ascii_letters) for i in range(length))
print(random_string(10))
输出:
'JvJvJvJvJv'
这段代码使用string.ascii_letters
和random.choice()
生成长度为length
的随机字符串。string.ascii_letters
是一个包含所有ASCII字母(大写和小写)的字符串。random.choice()
从序列中返回一个随机元素。for
循环生成length
数量的随机字母,join()
将它们连接成一个字符串。结果是一个长度为length
的随机字符串。这可以用于生成随机密码或其他唯一标识符。
注意事项
- 我使用相同的参数(temperature、top_p等)。唯一的区别是ChatCompletion/Completion API。
- 模型在两种情况下都相同,即gpt-35-turbo。
- 我保持温度较低,以便获得更一致的结果。
- 其他提示也会给出完全不同的答案,就像如果我尝试类似“什么是歌曲的定义?”的东西一样。
问题
- 为什么会发生这种情况?
- 鉴于它们使用相同的模型,相同的提示不应该给出类似的结果吗?
- 是否有OpenAI解释底层操作的参考资料?
英文:
I'm exploring the usage of different prompts on gpt3.5-turbo.
Investigating over the differences between "ChatCompletion" and "Completion", some references say that they should be more or less the same, for example: https://platform.openai.com/docs/guides/gpt/chat-completions-vs-completions
Other sources say, as expected, that ChatCompletion is more useful for chatbots, since you have "roles" (system, user and assistant), so that you can orchestrate things like few-shot examples and/or memory of previous chat messages. While Completion is more useful for summarization, or text generation.
But the difference seems to be much bigger. I can't find references where they explain what is happening under the hood.
The following experiment gives me totally diferent results, even when using the same model with the same parameters.
With ChatCompletion
import os
import openai
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
openai.api_base = ...
openai.api_key = ...
chat_response = openai.ChatCompletion.create(
engine="my_model", # gpt-35-turbo
messages = [{"role":"user","content":"Give me something intresting:\n"}],
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(chat_response.choices[0]['message']['content'])
Result is a fact about a war:
Did you know that the shortest war in history was between Britain and Zanzibar in 1896? It lasted only 38 minutes!
With Completion
regular_response = openai.Completion.create(
engine="my_model", # gpt-35-turbo
prompt="Give me something intresting:\n",
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(regular_response['choices'][0]['text'])
Result is a python code and some explanation of what it does:
```
import random
import string
def random_string(length):
return ''.join(random.choice(string.ascii_letters) for i in range(length))
print(random_string(10))
```
Output:
```
'JvJvJvJvJv'
```
This code generates a random string of length `length` using `string.ascii_letters` and `random.choice()`. `string.ascii_letters` is a string containing all ASCII letters (uppercase and lowercase). `random.choice()` returns a random element from a sequence. The `for` loop generates `length` number of random letters and `join()` concatenates them into a single string. The result is a random string of length `length`. This can be useful for generating random passwords or other unique identifiers.<|im_end|>
Notes
- I'm using the same parameters (temperature, top_p, etc). The only difference is the ChatCompletion/Completion api.
- The model is the same in both cases, gpt-35-turbo.
- I'm keeping the temperature low so I can get more consistent results.
- Other prompts also give totally different answers, like if I try something like "What is the definition of song?"
The Question
- Why is this happening?
- Shouldn't same prompts give similar results given that they are using the same model?
- Is there any reference material where OpenAI explains what it is doing under the hood?
答案1
得分: 1
我实际上是在回顾一些旧笔记时偶然找到了答案。
这都与隐藏标签有关,或者如我现在发现的那样,与聊天标记语言(ChatML)有关:https://github.com/openai/openai-python/blob/main/chatml.md
使用Completion API的这个提示现在几乎返回与ChatCompletion几乎相同的答案:
prompt = """<|im_start|>system
<|im_end|>
<|im_start|>user
给我一些有趣的东西:
<|im_end|>
<|im_start|>assistant
"""
regular_response = openai.Completion.create(
engine="my_model", # gpt-35-turbo
prompt=prompt,
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(regular_response['choices'][0]['text'])
现在的结果是关于一场战争的同样事实(带有结束标记):
你知道历史上最短暂的战争发生在1896年英国和桑给巴尔之间吗?这场战争只持续了38分钟,英国获胜了。<|im_end|>
看起来ChatCompletion API所做的一切就是在您的提示之间添加这些标记。
英文:
I actually found the answer by chance reviewing some old notebooks.
It's all on the hidden tags, or as I found out now, the Chat Markup Language (ChatML): https://github.com/openai/openai-python/blob/main/chatml.md
This prompt with the Completion api now returns almost the same answer as the ChatCompletion:
prompt = """<|im_start|>system
<|im_end|>
<|im_start|>user
Give me something intresting:
<|im_end|>
<|im_start|>assistant
"""
regular_response = openai.Completion.create(
engine="my_model", # gpt-35-turbo
prompt=prompt,
temperature=0,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)
print(regular_response['choices'][0]['text'])
Result now is the same fact about a war (with the ending tag):
Did you know that the shortest war in history was between Britain and Zanzibar in 1896? The war lasted only 38 minutes, with the British emerging victorious.<|im_end|>
It seems that all that the ChatCompletion api is doing is adding those tags in between your prompts.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论