2023年3月15日 19:06:40go评论70阅读模式

英文:

how to save a torch.tensor or np.array to redis and search vector similarity?

问题

I'm in trouble with saving my data to redis with python code.
只是在使用Redis和r.ft()函数时出现问题。

the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.
上传的数据将如下所示。我还想刷新相同ID中不同值的嵌入。

id is the data index and embeddings are going to be flatten with the same shape between all datas. (ex. 1024)
id是数据索引，嵌入将使用相同的形状展平在所有数据之间（例如，1024）。
id embeddings
0 [3.1515, 4.5562, ..., ]
1 [3, 8.62, ..., ]

after uploading to Redis, I want to search a certain batch of embeddings with Redis.
上传到Redis后，我想使用Redis搜索特定批次的嵌入。

if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that have similarity with embeddings in Redis.
如果输入批次的形状为[3, 1024]，则搜索应该迭代批次并返回与Redis中的嵌入具有相似性的[3，top-k]相似的ID。

it is really hard for me to make this right now. waiting for help.
现在我真的很难做到这一点。等待帮助。

英文:

I'm in trouble with saving my data to redis with python code.
just using redis and r.ft()

the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.

id is the data index and embeddings are going to be flatten with same shape between all datas. (ex. 1024)
id embeddings
0 [3.1515, 4.5562, ..., ]
1 [3, 8.62, ..., ]

after uploading redis, I want to search a certain batch embeddings with redis.

if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that has similarity with embeddings in redis.

it is really hard for me to make this right now. waiting for help.

答案1

得分: 3

以下是翻译好的内容：

A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.

To store a numpy array as a vector field in Redis, you need to first create a search index with a VectorField in the schema:

import numpy as np
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField
)

# connect
r = redis.Redis(...)

# define vector field
fields = [VectorField("vector",
    "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 1024,  # 1024 dimensions
        "DISTANCE_METRIC": "COSINE",
        "INITIAL_CAP": 10000, # approx initial count of docs in the index
    }
)]

# create search index
r.ft(INDEX_NAME).create_index(
    fields = fields,
    definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)

After you have an index, you can write data to Redis using hset and a pipeline. Vectors in Redis are stored as byte strings (see tobytes() below):

# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)

pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
    pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
    if id_ % 100 == 0:
        pipe.execute() # write batch
pipe.execute() # cleanup

Out of the box, you can use a pipeline call to query Redis multiple times with one API call:

base_query = f'⇒[KNN 5 @vector $vector AS vector_score]'
query = (
    Query(base_query)
    .sort_by("vector_score")
    .paging(0, 5)
    .dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)

# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
    pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()

Then you will need to unpack the res object that contains the raw response for all three queries from Redis. Hope this helps.

英文:

A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.

To store a numpy array as a vector field in Redis, you need to first create a search index with a VectorField in the schema:

import numpy as np
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField
)

# connect
r = redis.Redis(...)

# define vector field
fields = [VectorField(&quot;vector&quot;,
    &quot;FLAT&quot;, {
        &quot;TYPE&quot;: &quot;FLOAT32&quot;,
        &quot;DIM&quot;: 1024,  # 1024 dimensions
        &quot;DISTANCE_METRIC&quot;: &quot;COSINE&quot;,
        &quot;INITIAL_CAP&quot;: 10000, # approx initial count of docs in the index
    }
)]

# create search index
r.ft(INDEX_NAME).create_index(
    fields = fields,
    definition = IndexDefinition(prefix=[&quot;doc:&quot;], index_type=IndexType.HASH)
)

After you have an index, you can write data to Redis using hset and a pipeline. Vectors in Redis are stored as byte strings (see tobytes() below):

# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)

pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
    pipe.hset(key=f&quot;doc:{id_}&quot;, mapping={&quot;id&quot;: id_, &quot;vector&quot;: vector.tobytes()})
    if id_ % 100 == 0:
        pipe.execute() # write batch
pipe.execute() # cleanup

Out of the box, you can use a pipeline call to query Redis multiple times with one API call:

base_query = f&#39;*=&gt;[KNN 5 @vector $vector AS vector_score]&#39;
query = (
    Query(base_query)
    .sort_by(&quot;vector_score&quot;)
    .paging(0, 5)
    .dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)

# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
    pipe.ft(INDEX_NAME).search(query, {&quot;vector&quot;: query_vector.tobytes()})
res = pipe.execute()

Then you will need to unpack the res object that contains the raw response for all three queries from Redis. Hope this helps.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将torch.tensor或np.array保存到Redis并搜索向量相似性？

问题

答案1

如何从一个包含超过50个文件的Google Drive文件夹中下载所有文件？

如何使用ddtrace将Python应用程序的日志发送到Datadog？

按id分组，并查看前一行的值，以根据多个条件确定下一行的值。

Golang阶乘显示错误结果

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论