KNN向量相似性搜索在Redis中未返回任何结果。

huangapple go评论84阅读模式
英文:

KNN Vector similarity search in Redis, is not returning any results

问题

我正在尝试使用Redis存储从openAi API返回的嵌入向量,然后在NodeJs中执行相似性搜索以检索类似的结果。出于测试目的,我目前在Redis中有10个键,但查询从未返回记录。它总是返回一个空的文档列表:

{ total: 0, documents: [] }

架构声明:

const schema: RediSearchSchema = {
  '$.text': {
    type: SchemaFieldTypes.TEXT,
    AS: 'text',
  },
  '$.embedding': {
    type: SchemaFieldTypes.VECTOR,
    ALGORITHM: VectorAlgorithms.HNSW,
    TYPE: 'FLOAT32',
    DIM: 1536,
    DISTANCE_METRIC: 'COSINE',
    AS: 'embedding',
  },
};
    
RedisClient.registerIndex({
  schema: schema,
  name: 'contexts',
  prefix: KNOWLEGE_KEYS_PREFIX,
});

索引创建:

private static async createIndices() {
  RedisClient.indices.forEach(async (i) => {
    try {
      await RedisClient.client.ft.CREATE(i.name, i.schema, {
        ON: 'HASH',
        PREFIX: i.prefix,
      });
    } catch (err) {
      const message = `index ${i.name} already exists`;
      Logger.logError(message);
    }
  });
}

static registerIndex(ri: RedisIndex) {
  RedisClient.indices.push(ri);
}

向量添加:

RedisClient.client.HSET(key, {
  text: e.text,
  embedding: Buffer.from(new Float32Array(e.vector).buffer),
});

执行向量搜索的代码:

static async search(indexName: string, queryVector: Buffer, vectorFieldName = 'embedding', top = 5): Promise<any> {
  try {
    const query = `*=>[KNN ${top} @${vectorFieldName} $queryVector AS vec_score]`;
    console.log(query);
    const result = await RedisClient.client.ft.search(indexName, query, {
      PARAMS: {
        queryVector: queryVector,
      },
      DIALECT: 2,
      RETURN: ['text', 'vec_score'],
      SORTBY: 'vec_score',
      LIMIT: {
        from: 0,
        size: top,
      },
    });
    console.log(result);
    return result;
  } catch (err) {
    console.log(err);
    Logger.logError(err);
  }
}

这些代码片段位于不同的文件中,但都使用正确的值调用。我已经尝试使用Redis中一个键中存储的确切文本字段搜索向量,但仍然没有返回任何结果。非常感谢任何帮助。

英文:

I am trying to use Redis to store the embedding vectors returned from the openAi API, then perform a similarity search to retrieve similar results, in NodeJs. For test purposes, I have 10 keys in Redis at the moment, but the query never returns a record. It always returns an empty document list:

{ total: 0, documents: [] }

Schema Declaration:

const schema: RediSearchSchema = {
      &#39;$.text&#39;: {
        type: SchemaFieldTypes.TEXT,
        AS: &#39;text&#39;,
      },
      &#39;$.embedding&#39;: {
        type: SchemaFieldTypes.VECTOR,
        ALGORITHM: VectorAlgorithms.HNSW,
        TYPE: &#39;FLOAT32&#39;,
        DIM: 1536,
        DISTANCE_METRIC: &#39;COSINE&#39;,
        AS: &#39;embedding&#39;,
      },
    };
    
RedisClient.registerIndex({
      schema: schema,
      name: &#39;contexts&#39;,
      prefix: KNOWLEGE_KEYS_PREFIX,
    });

Index creation:

private static async createIndices() {
    RedisClient.indices.forEach(async (i) =&gt; {
      try {
        await RedisClient.client.ft.CREATE(i.name, i.schema, {
          ON: &#39;HASH&#39;,
          PREFIX: i.prefix,
        });
      } catch (err) {
        const message = `index ${i.name} already exists`;
        Logger.logError(message);
      }
    });
  }

static registerIndex(ri: RedisIndex) {
    RedisClient.indices.push(ri);
  }

Vector addition:

 RedisClient.client.HSET(key, {
          text: e.text,
          embedding: Buffer.from(new Float32Array(e.vector).buffer),
        });

Code for performing vector search:

static async search(indexName: string, queryVector: Buffer, vectorFieldName = &#39;embedding&#39;, top = 5): Promise&lt;any&gt; {
    try {
      const query = `*=&gt;[KNN ${top} @${vectorFieldName} $queryVector AS vec_score]`;
      console.log(query);
      const result = await RedisClient.client.ft.search(indexName, query, {
        PARAMS: {
          queryVector: queryVector,
        },
        DIALECT: 2,
        RETURN: [&#39;text&#39;, &#39;vec_score&#39;],
        SORTBY: &#39;vec_score&#39;,
        LIMIT: {
          from: 0,
          size: top,
        },
      });
      console.log(result);
      return result;
    } catch (err) {
      console.log(err);
      Logger.logError(err);
    }
  }

These snippets of code are present in different files, but all are getting called with proper values.
I have tried searching vector for the exact text field stored in one of the keys in Redis. Still, it does not return any results. Any help is much appreciated.

答案1

得分: 1

似乎你混合了JSON和HASH注释。你可以尝试在其中一个文档上运行HGET命令来验证其结构,并包括一个FT.INFO输出以验证索引参数。

"$.text" as "text""$.embedding" as "embedding" 表示你有一个JSON路径指向这两个字段,并且你为它们在查询中创建了别名。但是索引期望在你最初提供的路径下找到要索引的数据,由于你没有数据在 $.text$.embedding 下,它无法找到数据,索引保持为空。

尝试替换:

'text': {
    type: SchemaFieldTypes.TEXT,
},
'embedding': {
    type: SchemaFieldTypes.VECTOR,
    ALGORITHM: VectorAlgorithms.HNSW,
    TYPE: 'FLOAT32',
    DIM: 1536,
    DISTANCE_METRIC: 'COSINE',
},

如果这不是问题的原因,如果你提供了我提到的额外数据,我可以提供更好的帮助。

英文:

It seems like you mix JSON and HASH annotations. Can you try running an HGET command on one of the docs to verify its structure, and include an FT.INFO output to verify the index parameters?

the ”$.text” as “text” and ”$.embedding” as “embedding” suggest that you have a JSON path that leads to the two fields, and you make an alias name for referring to them in queries. But yet the index expect to find the data to index under the path you initially provided, and since you don’t have the data under $.text and $.embedding, it cannot find the data and the index remains empty.

Try replacing

  &#39;$.text&#39;: {
    type: SchemaFieldTypes.TEXT,
    AS: &#39;text&#39;,
  },
  &#39;$.embedding&#39;: {
    type: SchemaFieldTypes.VECTOR,
    ALGORITHM: VectorAlgorithms.HNSW,
    TYPE: &#39;FLOAT32&#39;,
    DIM: 1536,
    DISTANCE_METRIC: &#39;COSINE&#39;,
    AS: &#39;embedding&#39;,
  },

With

  &#39;text&#39;: {
    type: SchemaFieldTypes.TEXT,
  },
  &#39;embedding&#39;: {
    type: SchemaFieldTypes.VECTOR,
    ALGORITHM: VectorAlgorithms.HNSW,
    TYPE: &#39;FLOAT32&#39;,
    DIM: 1536,
    DISTANCE_METRIC: &#39;COSINE&#39;,
  },

If that’s not the problem, I could assist better if you’ll provide the additional data I mentioned

huangapple
  • 本文由 发表于 2023年8月4日 01:30:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830371.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定