Type “vector” 在postgresql – langchain 上不存在

huangapple go评论88阅读模式
英文:

Type "vector" does not exist on postgresql - langchain

问题

I was trying to embed some documents on postgresql with the help of pgvector extension and langchain. Unfortunately I'm having trouble with the following error:

(psycopg2.errors.UndefinedObject) type "vector" does not exist
LINE 4:  embedding VECTOR(1536), 
                   ^

[SQL: 
CREATE TABLE langchain_pg_embedding (
	collection_id UUID, 
	embedding VECTOR(1536), 
	document VARCHAR, 
	cmetadata JSON, 
	custom_id VARCHAR, 
	uuid UUID NOT NULL, 
	PRIMARY KEY (uuid), 
	FOREIGN KEY(collection_id) REFERENCES langchain_pg_collection (uuid) ON DELETE CASCADE
)
]

My environment info:

  • pgvector docker image ankane/pgvector:v0.4.1
  • python 3.10.6, psycopg2 2.9.6, pgvector 0.1.6

List of installed extensions on postgres

  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.4.1   | public     | vector data type and ivfflat access method

I've tried the following ways to resolve:

  1. Fresh installing the Postgres docker image with pgvector extension enabled.
  2. Manually install the extension with the official instruction.
  3. Manually install the extension on Postgres like the following:
CREATE EXTENSION IF NOT EXISTS vector
    SCHEMA public
    VERSION "0.4.1";

But no luck.

英文:

I was trying to embed some documents on postgresql with the help of pgvector extension and langchain. Unfortunately I'm having trouble with the following error:

(psycopg2.errors.UndefinedObject) type "vector" does not exist
LINE 4:  embedding VECTOR(1536), 
                   ^

[SQL: 
CREATE TABLE langchain_pg_embedding (
	collection_id UUID, 
	embedding VECTOR(1536), 
	document VARCHAR, 
	cmetadata JSON, 
	custom_id VARCHAR, 
	uuid UUID NOT NULL, 
	PRIMARY KEY (uuid), 
	FOREIGN KEY(collection_id) REFERENCES langchain_pg_collection (uuid) ON DELETE CASCADE
)
]

My environment info:

  • pgvector docker image ankane/pgvector:v0.4.1
  • python 3.10.6, psycopg2 2.9.6, pgvector 0.1.6

List of installed extensions on postgres

  Name   | Version |   Schema   |                Description                 
---------+---------+------------+--------------------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.4.1   | public     | vector data type and ivfflat access method

I've tried the following ways to resolve:

  1. Fresh installing the Postgres docker image with pgvector extension enabled.
  2. Manually install the extension with the official instruction.
  3. Manually install the extension on Postgres like the following:
CREATE EXTENSION IF NOT EXISTS vector
    SCHEMA public
    VERSION "0.4.1";

But no luck.

答案1

得分: 2

更新于2023年7月17日

如之前提到的,我的问题不在配置中,以下是可能导致错误的另一个原因:

  1. 数据库中未启用 pgvector 扩展。请确保在用于存储向量的每个数据库中运行 CREATE EXTENSION vector;
  2. 向量模式未包含在 search_path 中。运行 SHOW search_path; 查看搜索路径中可用的模式,运行 \dx 查看已安装扩展和模式的列表。

不幸的是,问题出在其他地方。我的 扩展安装search_path 模式对于我应该使用的指定数据库是完全正确的。但是,负责使用哪个数据库的环境变量混乱了,它使用了默认数据库 postgres,而不是我指定的数据库,后者未启用该扩展。

英文:

Update 17th July 2023

As previously I mentioned my issue was somewhere else in my configuration, here is the other reason that may be responsible for the error,

  1. The pgvector extension isn't enabled in the database you are using. Make sure you run CREATE EXTENSION vector; in each database you are using for storing vectors.
  2. The vector schema is not in the search_path. Run SHOW search_path; to see the available schemas in the search path and \dx to see the list of installed extensions with schemas.

Unfortunately, the issue was somewhere else. My extension installation and search_path schema were totally okay for the defined database I was supposed to use. But my environment variable which was responsible for which database to use, got messed up and was using the default database postgres instead of my defined database, which didn't have the extension enabled.

答案2

得分: 0

我也遇到过这样的问题,当我直接使用psycopg2连接到数据库并执行以下SQL语句时:

cur.execute('''
CREATE TABLE langchain_pg_embedding (
    uuid UUID NOT NULL,
    collection_id UUID,
    embedding VECTOR,
    document VARCHAR,
    cmetadata JSON,
    custom_id VARCHAR,
    PRIMARY KEY (uuid))
''')

成功执行这个数据库语句没有问题。然而,当我使用langchain时,遇到一个错误,提示数据类型不存在。

刚刚,我通过为数据库设置永久搜索路径来解决了这个问题。

ALTER DATABASE postgres SET SEARCH_PATH TO postgres_schema;
  • 在这里,“postgres”是当前数据库的名称。
  • “postgres_schema”代表要设置为搜索路径的模式。
  • 以上命令将永久更改数据库级别的模式搜索路径。
英文:

I have also encountered such an issue when I directly use psycopg2 to connect to the database and execute the following SQL statement:

cur.execute('''
CREATE TABLE langchain_pg_embedding (
    uuid UUID NOT NULL,
    collection_id UUID,
    embedding VECTOR,
    document VARCHAR,
    cmetadata JSON,
    custom_id VARCHAR,
    PRIMARY KEY (uuid))
''')

There is no issue executing this database statement successfully. However, when I use langchain, I encounter an error stating that the data type does not exist.

Just now, I resolved this issue by setting a permanent search path for the database.

ALTER DATABASE postgres SET SEARCH_PATH TO postgres_schema;
  • Here, “postgres” is the name of the current database.
  • The “postgres_schema” represents the schema to be set as the search path.
  • The above command will change the schema search path at the database level, permanently.

答案3

得分: 0

我通过以下步骤解决了问题:

cd /tmp
git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git
cd pgvector 
make
sudo make install 
CREATE EXTENSION vector;
英文:

I resolved the issue by follow the following steps:

cd /tmp
git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git
cd pgvector 
make
sudo make install 
CREATE EXTENSION vector;

答案4

得分: 0

Langchain使用两个表,只有一个使用VECTOR。在配置应用程序时,如果一个模式中创建了一个表,而另一个模式中创建了另一个表,也会导致此错误。

只需从您的模式和公共区域中删除(移动 Type “vector” 在postgresql – langchain 上不存在 )langchain表,然后在设置稳定后重新尝试启动应用程序。然后表应该能正确创建。

  • langchain_pg_collection - 普通表
  • langchain_pg_embedding - 具有矢量列,第二个创建并具有对langchain_pg_embedding_collection_id_fkey的外键
英文:

Langchain uses two tables and only one uses VECTOR. While configuring the application, if one table gets created in one schema and the other is getting created in another schema, that will cause this error as well.

Just delete(move Type “vector” 在postgresql – langchain 上不存在 ) the langchain tables from your schema and public and then retry starting the application again after the settings stabilized. Then the tables should be created correctly.

  • langchain_pg_collection - plain table
  • langchain_pg_embedding - has a vector column, is created second and has a foreign key to langchain_pg_embedding_collection_id_fkey

答案5

得分: 0

我通过在构建 Docker 容器时运行 init.sql 并使用 create extension 解决了类似的问题。

Docker Compose 的一部分如下所示:

volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

init.sql 内容如下:

CREATE EXTENSION vector;
英文:

I solved a similar problem by running init.sql with create extension when building the docker container.

The snippet of the docker-compose

volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

init.sql

CREATE EXTENSION vector;

huangapple
  • 本文由 发表于 2023年5月11日 00:28:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76220715.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定