英文:
how to save a torch.tensor or np.array to redis and search vector similarity?
问题
I'm in trouble with saving my data to redis with python code.
只是在使用Redis和r.ft()函数时出现问题。
the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.
上传的数据将如下所示。我还想刷新相同ID中不同值的嵌入。
id is the data index and embeddings are going to be flatten with the same shape between all datas. (ex. 1024)
id是数据索引,嵌入将使用相同的形状展平在所有数据之间(例如,1024)。
id embeddings
0 [3.1515, 4.5562, ..., ]
1 [3, 8.62, ..., ]
after uploading to Redis, I want to search a certain batch of embeddings with Redis.
上传到Redis后,我想使用Redis搜索特定批次的嵌入。
if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that have similarity with embeddings in Redis.
如果输入批次的形状为[3, 1024],则搜索应该迭代批次并返回与Redis中的嵌入具有相似性的[3,top-k]相似的ID。
it is really hard for me to make this right now. waiting for help.
现在我真的很难做到这一点。等待帮助。
英文:
I'm in trouble with saving my data to redis with python code.
just using redis and r.ft()
the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.
id is the data index and embeddings are going to be flatten with same shape between all datas. (ex. 1024)
id embeddings
0 [3.1515, 4.5562, ..., ]
1 [3, 8.62, ..., ]
after uploading redis, I want to search a certain batch embeddings with redis.
if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that has similarity with embeddings in redis.
it is really hard for me to make this right now. waiting for help.
答案1
得分: 3
以下是翻译好的内容:
A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.
To store a numpy
array as a vector field in Redis, you need to first create a search index with a VectorField
in the schema:
import numpy as np
import redis
from redis.commands.search.indexDefinition import (
IndexDefinition,
IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
TextField,
VectorField
)
# connect
r = redis.Redis(...)
# define vector field
fields = [VectorField("vector",
"FLAT", {
"TYPE": "FLOAT32",
"DIM": 1024, # 1024 dimensions
"DISTANCE_METRIC": "COSINE",
"INITIAL_CAP": 10000, # approx initial count of docs in the index
}
)]
# create search index
r.ft(INDEX_NAME).create_index(
fields = fields,
definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)
After you have an index, you can write data to Redis using hset
and a pipeline
. Vectors in Redis are stored as byte strings (see tobytes()
below):
# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)
pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
if id_ % 100 == 0:
pipe.execute() # write batch
pipe.execute() # cleanup
Out of the box, you can use a pipeline
call to query Redis multiple times with one API call:
base_query = f'⇒[KNN 5 @vector $vector AS vector_score]'
query = (
Query(base_query)
.sort_by("vector_score")
.paging(0, 5)
.dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)
# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()
Then you will need to unpack the res
object that contains the raw response for all three queries from Redis. Hope this helps.
英文:
A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.
To store a numpy
array as a vector field in Redis, you need to first create a search index with a VectorField
in the schema:
import numpy as np
import redis
from redis.commands.search.indexDefinition import (
IndexDefinition,
IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
TextField,
VectorField
)
# connect
r = redis.Redis(...)
# define vector field
fields = [VectorField("vector",
"FLAT", {
"TYPE": "FLOAT32",
"DIM": 1024, # 1024 dimensions
"DISTANCE_METRIC": "COSINE",
"INITIAL_CAP": 10000, # approx initial count of docs in the index
}
)]
# create search index
r.ft(INDEX_NAME).create_index(
fields = fields,
definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)
After you have an index, you can write data to Redis using hset
and a pipeline
. Vectors in Redis are stored as byte strings (see tobytes()
below):
# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)
pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
if id_ % 100 == 0:
pipe.execute() # write batch
pipe.execute() # cleanup
Out of the box, you can use a pipeline
call to query Redis multiple times with one API call:
base_query = f'*=>[KNN 5 @vector $vector AS vector_score]'
query = (
Query(base_query)
.sort_by("vector_score")
.paging(0, 5)
.dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)
# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()
Then you will need to unpack the res
object that contains the raw response for all three queries from Redis. Hope this helps.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论