Indexing and searching over a nested JSON field in Redis Python.

huangapple go评论70阅读模式
英文:

Indexing and searching over a nested JSON field in Redis Python

问题

Here's the translation of the code portion you provided:

我正在尝试为Redis内部的嵌套字段设置索引以便轻松进行搜索具体来说是一个表示时间戳的数字字段但我无法弄清楚文档相当复杂自从RedisSearch与主要的Redis合并以来我一直在努力寻找任何好的示例

以下是我的尝试

import time
from redis import Redis
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.field import NumericField
from redis.commands.search.query import Query, NumericFilter

def main():
    r = None
    test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": str(time.time())}]}}
    test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": str(time.time() + 10)}]}}

    try:
        r = Redis()
        r.json().set("uuid:4587-7d5f9-4545", "$", test_dict1)
        r.json().set("uuid:4587-7d5f9-4546", "$", test_dict2)
        r.ft('timestamp').create_index(fields=(NumericField("$.messages.timestamp")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.HASH))
        print(r.json().get("uuid:4587-7d5f9-4545", "$.context.test.other"))
        q = Query("*").add_filter(NumericFilter(field="$.messages.timestamp", minval=0, maxval=time.time()))

        print(r.ft('timestamp').search(q))
    except Exception as e:
        raise e
    finally:
        if r is not None:
            r.flushall()

if __name__ == "__main__":
    main()

Please note that I've removed the HTML entities (e.g., ") and replaced them with their actual characters for readability.

英文:

I am trying to set an index to a nested field inside Redis to search over it easily, specifically a numeric field representing a timestamp, but I can't figure it out. The documentation is quite complicated and ever since RedisSearch was merged with main Redis, I've been struggling to find any good examples.

Here's my attempt:

import time
from redis import Redis
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.field import NumericField
from redis.commands.search.query import Query, NumericFilter


def main():
    r = None
    test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": str(time.time())}]}}
    test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": str(time.time() + 10)}]}}

    try:
        r = Redis()
        r.json().set("uuid:4587-7d5f9-4545", "$", test_dict1)
        r.json().set("uuid:4587-7d5f9-4546", "$", test_dict2)
        r.ft('timestamp').create_index(fields=(NumericField("$.messages.timestamp")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.HASH))
        print(r.json().get("uuid:4587-7d5f9-4545", "$.context.test.other"))
        q = Query("*").add_filter(NumericFilter(field="$.messages.timestamp", minval=0, maxval=time.time()))

        print(r.ft('timestamp').search(q))
    except Exception as e:
        raise e
    finally:
        if r is not None:
            r.flushall()


if __name__ == "__main__":
    main()

That currently returns 0 results, but doesn't throw any errors.

答案1

得分: 1

看起来您正在为Redis中存储的哈希值创建索引(您的代码中有index_type=IndexType.HASH),但您的数据存储在JSON文档中。尝试切换到使用index_type=IndexType.JSON

英文:

It looks like you are creating an index over values stored in Hashes in Redis (your code has index_type=IndexType.HASH) but you are storing your data in JSON documents. Try swapping to using index_type=IndexType.JSON.

答案2

得分: 1

以下是翻译好的部分:

有一些问题。首先,您的字典将时间戳作为字符串存储,而将其索引为数值。这会因类型不匹配而导致静默失败。因此,请将其替换为:

    test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": time.time()}]}}
    test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": time.time() + 10}]}}

其次,您的字段定义中存在拼写错误,因为实际上没有位于 $.messages.timestamp 的 JSON 键,而是位于 $.context.messages.[*].timestamp,因此您需要更改索引定义。为了提高可读性,您可能希望为该字段添加别名。最后,正如 @simon-prickett 所说,您正在将文档索引为哈希,因此需要将其声明为 JSON 索引:

        r.ft('timestamp').create_index(fields=(NumericField("$.context.messages.[*].timestamp", as_name = "ts")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.JSON))

完成后,您可以进行查询,如下所示:

        q = Query("*").add_filter(NumericFilter(field="ts", minval=0, maxval=time.time()))

并获取您的结果。

英文:

There's a few problems here. First, your dictionary contains the timestamps as strings and they are indexed as numeric. That will silently fail because of the type mismatch. So, replace that with:

    test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": time.time()}]}}
    test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": time.time() + 10}]}}

Secondly, you've got a typo in your field definition as you don't actually have a JSON key at $.messages.timestamp, it's at $.context.messages.[*].timestamp so you need to change your index definition. For the sake of readability you might want to include an alias for that field. Finally, as @simon-prickett says, you are indexing the documents as hashes so you need to declare it as a JSON index:

        r.ft('timestamp').create_index(fields=(NumericField("$.context.messages.[*].timestamp", as_name = "ts")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.JSON))

Once that's done you can query as

        q = Query("*").add_filter(NumericFilter(field="ts", minval=0, maxval=time.time()))

and get your results.

huangapple
  • 本文由 发表于 2023年7月5日 01:02:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76614653.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定