如何将torch.tensor或np.array保存到Redis并搜索向量相似性?

huangapple go评论70阅读模式
英文:

how to save a torch.tensor or np.array to redis and search vector similarity?

问题

I'm in trouble with saving my data to redis with python code.
只是在使用Redis和r.ft()函数时出现问题。

the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.
上传的数据将如下所示。我还想刷新相同ID中不同值的嵌入。

id is the data index and embeddings are going to be flatten with the same shape between all datas. (ex. 1024)
id是数据索引,嵌入将使用相同的形状展平在所有数据之间(例如,1024)。
id embeddings
0 [3.1515, 4.5562, ..., ]
1 [3, 8.62, ..., ]

after uploading to Redis, I want to search a certain batch of embeddings with Redis.
上传到Redis后,我想使用Redis搜索特定批次的嵌入。

if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that have similarity with embeddings in Redis.
如果输入批次的形状为[3, 1024],则搜索应该迭代批次并返回与Redis中的嵌入具有相似性的[3,top-k]相似的ID。

it is really hard for me to make this right now. waiting for help.
现在我真的很难做到这一点。等待帮助。

英文:

I'm in trouble with saving my data to redis with python code.
just using redis and r.ft()

the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.

id is the data index and embeddings are going to be flatten with same shape between all datas. (ex. 1024)
id embeddings
0 [3.1515, 4.5562, ..., ]
1 [3, 8.62, ..., ]

after uploading redis, I want to search a certain batch embeddings with redis.

if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that has similarity with embeddings in redis.

it is really hard for me to make this right now. waiting for help.

答案1

得分: 3

以下是翻译好的内容:

A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.

To store a numpy array as a vector field in Redis, you need to first create a search index with a VectorField in the schema:

import numpy as np
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField
)

# connect
r = redis.Redis(...)

# define vector field
fields = [VectorField("vector",
    "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 1024,  # 1024 dimensions
        "DISTANCE_METRIC": "COSINE",
        "INITIAL_CAP": 10000, # approx initial count of docs in the index
    }
)]

# create search index
r.ft(INDEX_NAME).create_index(
    fields = fields,
    definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)

After you have an index, you can write data to Redis using hset and a pipeline. Vectors in Redis are stored as byte strings (see tobytes() below):

# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)

pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
    pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
    if id_ % 100 == 0:
        pipe.execute() # write batch
pipe.execute() # cleanup

Out of the box, you can use a pipeline call to query Redis multiple times with one API call:

base_query = f'⇒[KNN 5 @vector $vector AS vector_score]'
query = (
    Query(base_query)
    .sort_by("vector_score")
    .paging(0, 5)
    .dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)

# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
    pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()

Then you will need to unpack the res object that contains the raw response for all three queries from Redis. Hope this helps.

英文:

A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.

To store a numpy array as a vector field in Redis, you need to first create a search index with a VectorField in the schema:

import numpy as np
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField
)

# connect
r = redis.Redis(...)

# define vector field
fields = [VectorField("vector",
    "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 1024,  # 1024 dimensions
        "DISTANCE_METRIC": "COSINE",
        "INITIAL_CAP": 10000, # approx initial count of docs in the index
    }
)]

# create search index
r.ft(INDEX_NAME).create_index(
    fields = fields,
    definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)

After you have an index, you can write data to Redis using hset and a pipeline. Vectors in Redis are stored as byte strings (see tobytes() below):

# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)

pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
    pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
    if id_ % 100 == 0:
        pipe.execute() # write batch
pipe.execute() # cleanup

Out of the box, you can use a pipeline call to query Redis multiple times with one API call:

base_query = f'*=>[KNN 5 @vector $vector AS vector_score]'
query = (
    Query(base_query)
    .sort_by("vector_score")
    .paging(0, 5)
    .dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)

# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
    pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()

Then you will need to unpack the res object that contains the raw response for all three queries from Redis. Hope this helps.

huangapple
  • 本文由 发表于 2023年3月15日 19:06:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743879.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定