FAISS vectorstore created with LangChain yields AttributeError: 'OpenAIEmbeddings' object has no attribute 'deployment' / 'headers'

huangapple go评论124阅读模式
英文:

FAISS vectorstore created with LangChain yields AttributeError: 'OpenAIEmbeddings' object has no attribute 'deployment' / 'headers'

问题

过去的几周里,我一直在使用LangChain和OpenAI在Python中开展QA检索聊天机器人项目。我在Google Colab的笔记本中设置了一个数据摄取管道,通过它我一直在从PDF中提取文本,创建嵌入并存储到FAISS向量存储中,然后用于测试我的LangChain聊天机器人(一个Streamlit的Python应用程序)。我创建了一堆向量存储(每个对应一个PDF),这些向量存储是我在过去几天里创建的。

Google Colab的管道简单地获取提取的PDF页面,创建LangChain文档,最后嵌入它们并使用以下代码保存向量存储:

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
with open("file.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

然后,我会手动下载FAISS向量存储文件file.pkl,将其存储在我本地机器上的db文件夹中,我的Streamlit应用程序可以访问它,如下所示:

if os.path.exists(f"db/{filename}.pkl"):
   with open(f"db/{filename}.pkl", "rb") as f:
       vectorstore = pickle.load(f)

从今天(7月3日星期一)起,我使用Google Colab笔记本创建的任何新的FAISS向量存储都无法在我的应用程序中加载。我会收到一个异常错误,说变量"vectorstore"未定义。

我尝试下载笔记本并在本地创建向量存储,但结果仍然相同。

可惜的是,我一直没有注意在每次运行Colab笔记本时都安装了哪些版本的lanchain和openai。担心这可能是由于某个更新导致的,我确保我的Google Colab笔记本和本地环境是一样的:

langchain==0.0.205
openai==0.27.8
streamlit==1.22.0
faiss-cpu==1.7.4
tiktoken==0.4.0

现在向量存储可以在应用程序中加载,但我会收到以下错误:

AttributeError: 'OpenAIEmbeddings'对象没有属性'deployment'

如果我从本地机器上的相同笔记本创建向量存储,我会收到以下错误:

AttributeError: 'OpenAIEmbeddings'对象没有属性'headers'

升级到最新版本的langchain和openai也没有帮助。我尝试降级langchain的版本,但最终我遇到了不再支持gpt-3.5-turbo-16k(我的应用程序中使用的模型)的版本,并在运行应用程序时遇到了不同类型的错误。

除此之外,没有其他变化,我的应用程序正常启动,我在过去几天里创建的向量存储正常工作。只有我创建的任何新向量存储不再起作用。

可能发生了什么问题?

英文:

For the past few weeks I have been working at a QA retrieval chatbot project with LangChain and OpenAI in Python. I have an ingest pipepline set up in a notebook on Google Colab, with which I have been extracting text from PDFs, creating embeddings and storing into FAISS vectorstores, that I would then use to test my LangChain chatbot (a Streamlit python app). I have a bunch of vectorstores (one per PDF) that I have created in the past few days.

The Google Colab pipeline simply takes the extracted PDF pages, creates LangChain documents, and finally embeds them and saves the vectorstore with the follwing code

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
with open("file.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

I would then manually download the FAISS vectorstore file.pkl, store it on my local machine in a db folder that my Streamlit app can access as follows:

if os.path.exists(f"db/{filename}.pkl"):
   with open(f"db/{filename}.pkl", "rb") as f:
       vectorstore = pickle.load(f)

Since today (Monday 3 July) any new FAISS vectorstore that I create with my Google Colab notebook would not be loaded in my app. I would get an exception error saying that the variable "vectorstore" was not defined.

I thought I'd try downloading the notebook and creating the vectorstore locally, but the result was the same.

Alas, I had not been paying attention to what versions of lanchain and openai were being installed every time I'd run my Colab notebook. Fearing that it might be due to some update, I made sure both my Google Colab notebook and my local environment are the same:

langchain==0.0.205
openai==0.27.8
streamlit==1.22.0
faiss-cpu==1.7.4
tiktoken==0.4.0

Now the vectorstore gets loaded in the app, but I get the following error:

AttributeError: 'OpenAIEmbeddings' object has no attribute 'deployment'

If I create the vectorstore from the same notebook on my local machine, I get the following error:

AttributeError: 'OpenAIEmbeddings' object has no attribute 'headers'

Updating to the latest versions of langchain and openai does not help. I tried downgrading the langchain version, but eventually I reach one that no longer supports gpt-3.5-turbo-16k (the model used in my app) and I get a different kind of error when running my app.

Nothing else has changed, my app launches fine, the vectorstores I had created in the past few days work fine. Just any new vectorstores that I create no longer work.

What could have happened?

答案1

得分: 1

我找到了这个资源:https://dagster.io/blog/training-llms

为了生成VectorStore并将其保存为pkl文件,他们运行以下代码:

from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import OpenAIEmbeddings
import pickle

@asset
def vectorstore(documents):
    vectorstore_contents = FAISS.from_documents(documents, OpenAIEmbeddings())
    with open("vectorstore.pkl", "wb") as f:
        pickle.dump(vectorstore_contents, f)

随后(在本地保存了pkl文件后),他们将pkl文件读取为Langchain VectorStore对象。我尝试过这样做,它将pkl对象加载为一个具有所有属性的VectorStore对象。

from langchain.vectorstores import VectorStore
import pickle

vectorstore_file = "vectorstore.pkl"

with open(vectorstore_file, "rb") as f:
    global vectorstore
    local_vectorstore: VectorStore = pickle.load(f)

希望这有所帮助!

英文:

I found this resource: https://dagster.io/blog/training-llms

In order to generate the VectorStore and save it as a pkl file, they run the following:

from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import OpenAIEmbeddings
import pickle

@asset
def vectorstore(documents):
    vectorstore_contents = FAISS.from_documents(documents, OpenAIEmbeddings())
    with open("vectorstore.pkl", "wb") as f:
        pickle.dump(vectorstore_contents, f)

Subsequently, (having saved the pkl file locally) they read their pkl file as a Langchain VectorStore object. I've tried this and it loaded the pkl object as a VectorStore object with all of its attributes.

from langchain.vectorstores import VectorStore
import pickle

vectorstore_file = "vectorstore.pkl"

with open(vectorstore_file, "rb") as f:
    global vectorstore
    local_vectorstore: VectorStore = pickle.load(f)

Hope this helps!

huangapple
  • 本文由 发表于 2023年7月3日 21:44:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605344.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定