如何在Haystack GenerativeQAPipeline中将多个PromptNodes链接在一起

huangapple go评论61阅读模式
英文:

How to chain multiple PromptNodes together in a Haystack GenerativeQAPipeline

问题

我正在尝试使用Haystack将一个简单的问题回答提示与详细提示链在一起。我之前的代码正常运行如下:

import os
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever
from haystack.nodes import PromptNode, PromptTemplate, AnswerParser
from haystack.pipelines import Pipeline, TextIndexingPipeline

class Bert:
    pipe = None

    def __init__(self, data_path):
        print("Initializing model...")
        doc_dir = data_path
        document_store = InMemoryDocumentStore(use_bm25=True)

        files_to_index = [os.path.join(doc_dir, f) for f in os.listdir(doc_dir)]
        indexing_pipeline = TextIndexingPipeline(document_store)
        indexing_pipeline.run_batch(file_paths=files_to_index)

        print("Done indexing")

        retriever = BM25Retriever(document_store=document_store, top_k=2)

        lfqa_prompt = PromptTemplate(
            prompt="""Synthesize a comprehensive answer from the following text for the given 
                     question.
                     Provide a clear and concise response that summarizes the key 
                     points and information presented in the text.
                     Your answer should be in your own words and be no longer than 
                     50 words.
                     \n\n Related text: {join(documents)} \n\n Question: {query} 
                     \n\n Answer:""",
            output_parser=AnswerParser(),
        )

        prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", 
                                default_prompt_template=lfqa_prompt)

        self.pipe = Pipeline()
        self.pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
        self.pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])
        #self.pipe.add_node(component=elaboration_node, name="elaboration_node", inputs=["Query", "retriever", "prompt_node"])

    def generate(self, query):
        prediction = self.pipe.run(query=query)
        return prediction

但是,当我尝试将另一个PromptNode链接到lfqa_prompt的末尾时,我遇到了错误。我在网上做了一些研究,并看到我可能需要使用Shaper,我编辑了我的代码如下:

import os
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import AnswerParser, BM25Retriever, BaseComponent, PromptNode, 
    PromptTemplate, Shaper
from haystack.schema import Answer, Document, List
from haystack.pipelines import Pipeline, TextIndexingPipeline

class QAPromptOutputAdapter(BaseComponent):
    outgoing_edges = 1

    def run(self, **kwargs):
        print(kwargs)
        return {"answers": [Answer(answer=result, type="generative") for result in results]}, "output_1"

    def run_batch(self):
        pass

class Bert:
    pipe = None

    def __init__(self, data_path):
        print("Initializing model...")
        doc_dir = data_path
        document_store = InMemoryDocumentStore(use_bm25=True)

        files_to_index = [os.path.join(doc_dir, f) for f in os.listdir(doc_dir)]
        indexing_pipeline = TextIndexingPipeline(document_store)
        indexing_pipeline.run_batch(file_paths=files_to_index)

        print("Done indexing")

        retriever = BM25Retriever(document_store=document_store, top_k=2)

        lfqa_prompt = PromptTemplate(
            prompt="""Synthesize a comprehensive answer from the following text for the given 
                     question.
                     Provide a clear and concise response that summarizes the key 
                     points and information presented in the text.
                     Your answer should be in your own words and be no longer than 
                     50 words.
                     \n\n Related text: {join(documents)} \n\n Question: {query} 
                     \n\n Answer:""",
            #output_parser=AnswerParser(),
        )

        prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", 
                                default_prompt_template=lfqa_prompt)

        question_shaper = Shaper(func="value_to_list", inputs={"value": "query", "target_list": "documents"},
                                 outputs=["questions"])
        answer_shaper = Shaper(func="value_to_list",
                               inputs={"value": "prompt_node.results", "target_list": "documents"}, outputs=["answers"])

        elaboration_prompt = PromptTemplate(
            prompt="""Elaborate on the answer to the following question given the related texts.
                     Provide additional details to the answer in your own words.
                     The final response should be between 100-200 words.
                     \n\n Related text: {join(documents)} \n\n Question: 
                     {questions} \n\n Answer: {outputs}""",
            output_parser=AnswerParser(),
        )
        elaboration_node = PromptNode(model_name_or_path="google/flan-t5-large",
                                      default_prompt_template=elaboration_prompt)

        self.pipe = Pipeline()
        self.pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
        self.pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])
        self.pipe.add_node(component=question_shaper, name="question_shaper", inputs=["prompt_node"])
        self.pipe.add_node(component=answer_shaper, name="answer_shaper", inputs=["prompt_node"])
        self.pipe.add_node(component=elaboration_node, name="elaboration_node",
                           inputs=["question_shaper", "retriever", "answer_shaper"])

    def generate(self, query):
        prediction = self.pipe.run(query=query)
        return prediction

现在我只得到:

Exception: Exception while running node 'answer_shaper': name 'results' is not defined

这是将两个PromptNode链接在一起的正确解决方案吗?我应该使用shapers吗,还是我完全错了?我对Haystack和生成式AI模型相当新,因此非常感谢您的帮助。

请注意,由于您请求只返回翻译部分,因此上述内容仅包括代码段的翻译。如果您需要更多帮助或解释,请随时告诉我。

英文:

I'm trying to chain together a simple question answering prompt to an elaboration prompt using Haystack.

I had the following code working just fine:

import os
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever
from haystack.nodes import PromptNode, PromptTemplate, AnswerParser
from haystack.pipelines import Pipeline, TextIndexingPipeline
class Bert:
pipe = None
def __init__(self, data_path):
print("Initializing model...")
doc_dir = data_path
document_store = InMemoryDocumentStore(use_bm25=True)
files_to_index = [os.path.join(doc_dir, f) for f in os.listdir(doc_dir)]
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)
print("Done indexing")
retriever = BM25Retriever(document_store=document_store, top_k=2)
lfqa_prompt = PromptTemplate(
prompt="""Synthesize a comprehensive answer from the following text for the given 
question.
Provide a clear and concise response that summarizes the key 
points and information presented in the text.
Your answer should be in your own words and be no longer than 
50 words.
\n\n Related text: {join(documents)} \n\n Question: {query} 
\n\n Answer:""",
output_parser=AnswerParser(),
)
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", 
default_prompt_template=lfqa_prompt)
elaboration_prompt = PromptTemplate(
prompt="""Elaborate on the answer to the following question given the related texts.
Provide additional details to the answer in your own words.
The final response should be between 100-200 words.
\n\n Related text: {join(documents)} \n\n Question: {query} 
\n\n Answer: {prompt_node}""",
output_parser=AnswerParser(),
)
elaboration_node = PromptNode(model_name_or_path="google/flan-t5-large", 
default_prompt_template=elaboration_prompt)
self.pipe = Pipeline()
self.pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
self.pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])
#self.pipe.add_node(component=elaboration_node, name="elaboration_node", inputs=["Query", 
"retriever", "prompt_node"])
def generate(self, query):
prediction = self.pipe.run(query=query)
return prediction

But when I tried to chain another PromptNode to the end of the lfqa_prompt, I ran into errors. I did some research online and saw that I may need to use Shapers and I edited my code as follows:

import os
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import AnswerParser, BM25Retriever, BaseComponent, PromptNode, 
PromptTemplate, Shaper
from haystack.schema import Answer, Document, List
from haystack.pipelines import Pipeline, TextIndexingPipeline
class QAPromptOutputAdapter(BaseComponent):
outgoing_edges = 1
def run(self, **kwargs):
print(kwargs)
return {"answers": [Answer(answer=result, type="generative") for result in results]}, 
"output_1"
def run_batch(self):
pass
class Bert:
pipe = None
def __init__(self, data_path):
print("Initializing model...")
doc_dir = data_path
document_store = InMemoryDocumentStore(use_bm25=True)
files_to_index = [os.path.join(doc_dir, f) for f in os.listdir(doc_dir)]
indexing_pipeline = TextIndexingPipeline(document_store)
indexing_pipeline.run_batch(file_paths=files_to_index)
print("Done indexing")
retriever = BM25Retriever(document_store=document_store, top_k=2)
lfqa_prompt = PromptTemplate(
prompt="""Synthesize a comprehensive answer from the following text for the given 
question.
Provide a clear and concise response that summarizes the key 
points and information presented in the text.
Your answer should be in your own words and be no longer than 
50 words.
\n\n Related text: {join(documents)} \n\n Question: {query} 
\n\n Answer:""",
#output_parser=AnswerParser(),
)
prompt_node = PromptNode(model_name_or_path="google/flan-t5-large", 
default_prompt_template=lfqa_prompt)
question_shaper = Shaper(func="value_to_list", inputs={"value": "query", "target_list": 
"documents"},
outputs=["questions"])
answer_shaper = Shaper(func="value_to_list",
inputs={"value": "prompt_node.results", 
"target_list": "documents"}, outputs=["answers"])
elaboration_prompt = PromptTemplate(
prompt="""Elaborate on the answer to the following question given the related texts.
Provide additional details to the answer in your own words.
The final response should be between 100-200 words.
\n\n Related text: {join(documents)} \n\n Question: 
{questions} \n\n Answer: {outputs}""",
output_parser=AnswerParser(),
)
elaboration_node = PromptNode(model_name_or_path="google/flan-t5-large",
default_prompt_template=elaboration_prompt)
self.pipe = Pipeline()
self.pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
self.pipe.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])
self.pipe.add_node(component=question_shaper, name="question_shaper", inputs= 
["prompt_node"])
self.pipe.add_node(component=answer_shaper, name="answer_shaper", inputs=["prompt_node"])
self.pipe.add_node(component=elaboration_node, name="elaboration_node",
inputs=["question_shaper", "retriever", "answer_shaper"])
def generate(self, query):
prediction = self.pipe.run(query=query)
return prediction

Now I just get:

> Exception: Exception while running node 'answer_shaper': name 'results' is not defined

Is this the correct solution to chaining two prompt nodes together? Should I be using shapers or am I going about this completely wrong? I'm fairly new to Haystack and generative AI models in general, so help is greatly appreciated.

答案1

得分: 4

"output_variable" 方法对我有效。以下是完整的示例,您可以复制/粘贴并自行运行以进行验证:

import os

from haystack import Document
from haystack.nodes import PromptNode, PromptTemplate
from haystack.pipelines import Pipeline

openai_key = os.environ.get("OPENAI_API_KEY")
if not openai_key:
    raise ValueError("请设置 OPENAI_API_KEY 环境变量")

documents = [Document("柏林是德国的首都。")]
pt = PromptTemplate("鉴于上下文,请回答问题,不要详细说明。\n\n"
                    "上下文: {join(documents)}; \n\n 问题: {query} \n\n答案:")

lfqa_node = PromptNode(model_name_or_path="gpt-3.5-turbo",
                       api_key=openai_key,
                       max_length=512,
                       default_prompt_template=pt,
                       output_variable="my_answer")

elaboration_prompt = PromptTemplate("提供关于这个主题的附加细节: {my_answer}")
elaboration_node = PromptNode(model_name_or_path="gpt-3.5-turbo",
                              api_key=openai_key,
                              max_length=512,
                              default_prompt_template=elaboration_prompt)

pipe = Pipeline()
pipe.add_node(component=lfqa_node, name="lfqa_node", inputs=["Query"])
pipe.add_node(component=elaboration_node, name="elaboration_node", inputs=["lfqa_node"])

result = pipe.run(query="德国的首都是什么?", documents=documents)
print(result)

结果是一个包含有关管道执行运行的所有相关细节的字典,包括结果列表、任何输出变量(在我们的示例中为 my_answer )、查询、文档以及在管道节点之间传递的管道调用上下文。

英文:

The output_variable approach works for me. Here is the complete example you can copy/paste and run by yourself to verify:

import os
from haystack import Document
from haystack.nodes import PromptNode, PromptTemplate
from haystack.pipelines import Pipeline
openai_key = os.environ.get("OPENAI_API_KEY")
if not openai_key:
raise ValueError("Please set the OPENAI_API_KEY environment variable")
documents = [Document("Berlin is the capital of Germany.")]
pt = PromptTemplate("Given the context please answer the question, don't elaborate. \n\n"
"Context: {join(documents)}; \n\n Question: {query} \n\nAnswer:")
lfqa_node = PromptNode(model_name_or_path="gpt-3.5-turbo",
api_key=openai_key,
max_length=512,
default_prompt_template=pt,
output_variable="my_answer")
elaboration_prompt = PromptTemplate("Provide additional details about this topic: {my_answer}")
elaboration_node = PromptNode(model_name_or_path="gpt-3.5-turbo",
api_key=openai_key,
max_length=512,
default_prompt_template=elaboration_prompt)
pipe = Pipeline()
pipe.add_node(component=lfqa_node, name="lfqa_node", inputs=["Query"])
pipe.add_node(component=elaboration_node, name="elaboration_node", inputs=["lfqa_node"])
result = pipe.run(query="What is the capital of Germany?", documents=documents)
print(result)

The result is a dictionary containing all the relevant details about the pipeline execution run, including the results list, any output variables (in our example, my_answer, query, documents and the pipeline invocation context being passed between the pipeline nodes.

答案2

得分: 1

答案应该是设置PromptNode的"output_variable"参数,如下所示:

lfqa_node = PromptNode(
    model_name_or_path="google/flan-t5-large", 
    default_prompt_template=lfqa_prompt, 
    output_variable="my_answer"
)

然后,您可以像这样使用输出:

elaboration_prompt = PromptTemplate(
    prompt="""
         ...
         Previous answer: {my_answer} \n\n New answer: 
    """
)

然而,对我来说,这个解决方案似乎不起作用,所以我只是编写了两个单独的流水线,并手动解析了第一个流水线的响应,然后将答案变量输入到第二个流水线中,如下所示:

lfqa = self.pipe.run(query=query)
lfqa_answer = lfqa['results'][0]
elaboration = self.elaboration_pipeline.run(query=lfqa_answer)
英文:

The answer is supposedly to set the "output_variable" parameter of the PromptNode like this:

lfqa_node = PromptNode(
model_name_or_path="google/flan-t5-large", 
default_prompt_template=lfqa_prompt, 
output_variable="my_answer"
)

And then you can use the output like:

elaboration_prompt = PromptTemplate(
prompt="""
...
Previous answer: {my_answer} \n\n New answer: 
"""
)

However, this solution did not seem to work for me, so I simply wrote two separate pipelines, and manually parsed the response from the first pipeline and inputted the answer variable into the second pipeline like this:

lfqa = self.pipe.run(query=query)
lfqa_answer = lfqa['results'][0]
elaboration = self.elaboration_pipeline.run(query=lfqa_answer)

huangapple
  • 本文由 发表于 2023年7月10日 16:06:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76651826.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定