2023年7月13日 12:25:07go评论183阅读模式

英文:

Vertex AI endpoint 500 Internal Server Error

问题

I tried to deploy a custom container to Vertex AI endpoint using LLM model (PaLM), the container is successfully deployed to the endpoint with the following code and dockerfile. But when I tried to query it with Vertex AI API or gcloud cli, I get a 500 Internal Server Error reply.

May I know what's the cause of this error?

Am I using the right way to deploy the model?

Python Code:

import uvicorn
import os
import numpy as np
from fastapi import Request, FastAPI, Response
from fastapi.responses import JSONResponse
from langchain.vectorstores.matching_engine import MatchingEngine
from langchain.agents import Tool
from langchain.embeddings import VertexAIEmbeddings
from vertexai.preview.language_models import TextGenerationModel

# ... (rest of the code)

Docker file:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
RUN pip install --no-cache-dir google-cloud-aiplatform==1.25.0 langchain==0.0.187 xmltodict==0.13.0 unstructured==0.7.0 pdf2image==1.16.3 numpy==1.23.1 pydantic==1.10.8 typing-inspect==0.8.0 typing_extensions==4.5.0
COPY main.py ./main.py

Cloudbuild.yaml:

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/<project name>/chatbot', '.']
- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/<project name>/chatbot']

images:
- gcr.io/<project name>/chatbot

Code to query the model endpoint:

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

instances = [{"question": "<Some question>"}]

endpoint = aiplatform.Endpoint("projects/<project id>/locations/us-central1/endpoints/<model endpoint id>")

prediction = endpoint.predict(instances=instances)
print(prediction)

Error message:

英文:

May I know what's the cause of this error?

Am I using the right way to deploy the model?

Python Code

import uvicorn

#import tensorflow as tf
import os
import numpy as np
#from enum import Enum
#from typing import List, Optional
#from pydantic import BaseModel

from fastapi import Request, FastAPI, Response
from fastapi.responses import JSONResponse

from langchain.vectorstores.matching_engine import MatchingEngine
from langchain.agents import Tool
from langchain.embeddings import VertexAIEmbeddings
from vertexai.preview.language_models import TextGenerationModel

embeddings = VertexAIEmbeddings()

INDEX_ID = &quot;&lt;index id&gt;&quot;
ENDPOINT_ID = &quot;&lt;index endpoint id&gt;&quot;
PROJECT_ID = &#39;&lt;project name&gt;&#39;
REGION = &#39;us-central1&#39;
DOCS_BUCKET=&#39;&lt;bucket name&gt;&#39;
TEXT_GENERATION_MODEL=&#39;text-bison@001&#39;

def matching_engine_search(question):

    vector_store = MatchingEngine.from_components(
                        index_id=INDEX_ID,
                        region=REGION,
                        embedding=embeddings,
                        project_id=PROJECT_ID,
                        endpoint_id=ENDPOINT_ID,
                        gcs_bucket_name=DOCS_BUCKET)

    relevant_documentation=vector_store.similarity_search(question, k=8)
    context = &quot;\n&quot;.join([doc.page_content for doc in relevant_documentation])[:10000] #[:10000]
    return str(context)

app = FastAPI(title=&quot;Chatbot&quot;)

AIP_HEALTH_ROUTE = os.environ.get(&#39;AIP_HEALTH_ROUTE&#39;, &#39;/health&#39;)
AIP_PREDICT_ROUTE = os.environ.get(&#39;AIP_PREDICT_ROUTE&#39;, &#39;/predict&#39;)

#class Prediction(BaseModel):
#  response: str 


@app.get(AIP_HEALTH_ROUTE, status_code=200)
async def health():
    return {&#39;health&#39;: &#39;ok&#39;}

@app.post(AIP_PREDICT_ROUTE)#, 
          #response_model=Predictions,
          #response_model_exclude_unset=True
async def predict(request: Request):
    body = await request.json()
    print(body)

    question = body[&quot;question&quot;]

    matching_engine_response=matching_engine_search(question)

    prompt=f&quot;&quot;&quot;
    Follow exactly those 3 steps:
    1. Read the context below and aggregrate this data
    Context : {matching_engine_response}
    2. Answer the question using only this context
    3. Show the source for your answers
    User Question: {question}


    If you don&#39;t have any context and are unsure of the answer, reply that you don&#39;t know about this topic.
    &quot;&quot;&quot;

    model = TextGenerationModel.from_pretrained(TEXT_GENERATION_MODEL)
    response = model.predict(
            prompt,
            temperature=0.2,
            top_k=40,
            top_p=.8,
            max_output_tokens=1024,
    )

    print(f&quot;Question: \n{question}&quot;)
    print(f&quot;Response: \n{response.text}&quot;)


    outputs = response.text

    return {&quot;predictions&quot;: [{&quot;response&quot;: response.text}] }#Prediction(outputs)

if __name__ == &quot;__main__&quot;:
  uvicorn.run(app, host=&quot;0.0.0.0&quot;,port=8080)

Docker file

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
RUN pip install --no-cache-dir google-cloud-aiplatform==1.25.0 langchain==0.0.187 xmltodict==0.13.0 unstructured==0.7.0 pdf2image==1.16.3 numpy==1.23.1 pydantic==1.10.8 typing-inspect==0.8.0 typing_extensions==4.5.0
COPY main.py ./main.py

Cloudbuild.yaml

steps:
# Build the container image
- name: &#39;gcr.io/cloud-builders/docker&#39;
  args: [&#39;build&#39;, &#39;-t&#39;, &#39;gcr.io/&lt;project name&gt;/chatbot&#39;, &#39;.&#39;]
# Push the container image to Container Registry
- name: &#39;gcr.io/cloud-builders/docker&#39;
  args: [&#39;push&#39;, &#39;gcr.io/&lt;project name&gt;/chatbot&#39;]

images:
- gcr.io/&lt;project name&gt;/chatbot

Code to query the model endpoint

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID,
                location=REGION)

instances = [{&quot;question&quot;: &quot;&lt;Some question&gt;&quot;}]

endpoint = aiplatform.Endpoint(&quot;projects/&lt;project id&gt;/locations/us-central1/endpoints/&lt;model endpoint id&gt;&quot;)

prediction = endpoint.predict(instances=instances)
print(prediction)

Error message

答案1

得分: 1

根据文档提到，内部错误通常是暂时的，尝试重新发送请求可能会解决问题。如果错误仍然存在，您可以联系支持，或者您可以在问题跟踪器上打开一个新的线程来描述您的问题。

英文:

As mentioned in the document, the internal errors are usually transient and trying to resend the request might resolve the issue. If the error still persists, you can contact support or you can open a new thread on the issue tracker describing your issue.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Vertex AI端点 500内部服务器错误

问题

答案1

如何将PyTorch模型架构字符串转换为树状数据结构？

Pandas groupby apply (nested) slow

正则表达式按括号拆分，但不是所有括号。

有没有办法使用多线程来写入同一个CSV文件的不同列？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论