英文:
KNN Vector similarity search in Redis, is not returning any results
问题
我正在尝试使用Redis存储从openAi API返回的嵌入向量,然后在NodeJs中执行相似性搜索以检索类似的结果。出于测试目的,我目前在Redis中有10个键,但查询从未返回记录。它总是返回一个空的文档列表:
{ total: 0, documents: [] }
架构声明:
const schema: RediSearchSchema = {
'$.text': {
type: SchemaFieldTypes.TEXT,
AS: 'text',
},
'$.embedding': {
type: SchemaFieldTypes.VECTOR,
ALGORITHM: VectorAlgorithms.HNSW,
TYPE: 'FLOAT32',
DIM: 1536,
DISTANCE_METRIC: 'COSINE',
AS: 'embedding',
},
};
RedisClient.registerIndex({
schema: schema,
name: 'contexts',
prefix: KNOWLEGE_KEYS_PREFIX,
});
索引创建:
private static async createIndices() {
RedisClient.indices.forEach(async (i) => {
try {
await RedisClient.client.ft.CREATE(i.name, i.schema, {
ON: 'HASH',
PREFIX: i.prefix,
});
} catch (err) {
const message = `index ${i.name} already exists`;
Logger.logError(message);
}
});
}
static registerIndex(ri: RedisIndex) {
RedisClient.indices.push(ri);
}
向量添加:
RedisClient.client.HSET(key, {
text: e.text,
embedding: Buffer.from(new Float32Array(e.vector).buffer),
});
执行向量搜索的代码:
static async search(indexName: string, queryVector: Buffer, vectorFieldName = 'embedding', top = 5): Promise<any> {
try {
const query = `*=>[KNN ${top} @${vectorFieldName} $queryVector AS vec_score]`;
console.log(query);
const result = await RedisClient.client.ft.search(indexName, query, {
PARAMS: {
queryVector: queryVector,
},
DIALECT: 2,
RETURN: ['text', 'vec_score'],
SORTBY: 'vec_score',
LIMIT: {
from: 0,
size: top,
},
});
console.log(result);
return result;
} catch (err) {
console.log(err);
Logger.logError(err);
}
}
这些代码片段位于不同的文件中,但都使用正确的值调用。我已经尝试使用Redis中一个键中存储的确切文本字段搜索向量,但仍然没有返回任何结果。非常感谢任何帮助。
英文:
I am trying to use Redis to store the embedding vectors returned from the openAi API, then perform a similarity search to retrieve similar results, in NodeJs. For test purposes, I have 10 keys in Redis at the moment, but the query never returns a record. It always returns an empty document list:
{ total: 0, documents: [] }
Schema Declaration:
const schema: RediSearchSchema = {
'$.text': {
type: SchemaFieldTypes.TEXT,
AS: 'text',
},
'$.embedding': {
type: SchemaFieldTypes.VECTOR,
ALGORITHM: VectorAlgorithms.HNSW,
TYPE: 'FLOAT32',
DIM: 1536,
DISTANCE_METRIC: 'COSINE',
AS: 'embedding',
},
};
RedisClient.registerIndex({
schema: schema,
name: 'contexts',
prefix: KNOWLEGE_KEYS_PREFIX,
});
Index creation:
private static async createIndices() {
RedisClient.indices.forEach(async (i) => {
try {
await RedisClient.client.ft.CREATE(i.name, i.schema, {
ON: 'HASH',
PREFIX: i.prefix,
});
} catch (err) {
const message = `index ${i.name} already exists`;
Logger.logError(message);
}
});
}
static registerIndex(ri: RedisIndex) {
RedisClient.indices.push(ri);
}
Vector addition:
RedisClient.client.HSET(key, {
text: e.text,
embedding: Buffer.from(new Float32Array(e.vector).buffer),
});
Code for performing vector search:
static async search(indexName: string, queryVector: Buffer, vectorFieldName = 'embedding', top = 5): Promise<any> {
try {
const query = `*=>[KNN ${top} @${vectorFieldName} $queryVector AS vec_score]`;
console.log(query);
const result = await RedisClient.client.ft.search(indexName, query, {
PARAMS: {
queryVector: queryVector,
},
DIALECT: 2,
RETURN: ['text', 'vec_score'],
SORTBY: 'vec_score',
LIMIT: {
from: 0,
size: top,
},
});
console.log(result);
return result;
} catch (err) {
console.log(err);
Logger.logError(err);
}
}
These snippets of code are present in different files, but all are getting called with proper values.
I have tried searching vector for the exact text field stored in one of the keys in Redis. Still, it does not return any results. Any help is much appreciated.
答案1
得分: 1
似乎你混合了JSON和HASH注释。你可以尝试在其中一个文档上运行HGET命令来验证其结构,并包括一个FT.INFO输出以验证索引参数。
"$.text" as "text"
和 "$.embedding" as "embedding"
表示你有一个JSON路径指向这两个字段,并且你为它们在查询中创建了别名。但是索引期望在你最初提供的路径下找到要索引的数据,由于你没有数据在 $.text
和 $.embedding
下,它无法找到数据,索引保持为空。
尝试替换:
'text': {
type: SchemaFieldTypes.TEXT,
},
'embedding': {
type: SchemaFieldTypes.VECTOR,
ALGORITHM: VectorAlgorithms.HNSW,
TYPE: 'FLOAT32',
DIM: 1536,
DISTANCE_METRIC: 'COSINE',
},
如果这不是问题的原因,如果你提供了我提到的额外数据,我可以提供更好的帮助。
英文:
It seems like you mix JSON and HASH annotations. Can you try running an HGET command on one of the docs to verify its structure, and include an FT.INFO output to verify the index parameters?
the ”$.text” as “text”
and ”$.embedding” as “embedding”
suggest that you have a JSON path that leads to the two fields, and you make an alias name for referring to them in queries. But yet the index expect to find the data to index under the path you initially provided, and since you don’t have the data under $.text
and $.embedding
, it cannot find the data and the index remains empty.
Try replacing
'$.text': {
type: SchemaFieldTypes.TEXT,
AS: 'text',
},
'$.embedding': {
type: SchemaFieldTypes.VECTOR,
ALGORITHM: VectorAlgorithms.HNSW,
TYPE: 'FLOAT32',
DIM: 1536,
DISTANCE_METRIC: 'COSINE',
AS: 'embedding',
},
With
'text': {
type: SchemaFieldTypes.TEXT,
},
'embedding': {
type: SchemaFieldTypes.VECTOR,
ALGORITHM: VectorAlgorithms.HNSW,
TYPE: 'FLOAT32',
DIM: 1536,
DISTANCE_METRIC: 'COSINE',
},
If that’s not the problem, I could assist better if you’ll provide the additional data I mentioned
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论