英文:
Indexing and searching over a nested JSON field in Redis Python
问题
Here's the translation of the code portion you provided:
我正在尝试为Redis内部的嵌套字段设置索引,以便轻松进行搜索,具体来说是一个表示时间戳的数字字段,但我无法弄清楚。文档相当复杂,自从RedisSearch与主要的Redis合并以来,我一直在努力寻找任何好的示例。
以下是我的尝试:
import time
from redis import Redis
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.field import NumericField
from redis.commands.search.query import Query, NumericFilter
def main():
r = None
test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": str(time.time())}]}}
test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": str(time.time() + 10)}]}}
try:
r = Redis()
r.json().set("uuid:4587-7d5f9-4545", "$", test_dict1)
r.json().set("uuid:4587-7d5f9-4546", "$", test_dict2)
r.ft('timestamp').create_index(fields=(NumericField("$.messages.timestamp")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.HASH))
print(r.json().get("uuid:4587-7d5f9-4545", "$.context.test.other"))
q = Query("*").add_filter(NumericFilter(field="$.messages.timestamp", minval=0, maxval=time.time()))
print(r.ft('timestamp').search(q))
except Exception as e:
raise e
finally:
if r is not None:
r.flushall()
if __name__ == "__main__":
main()
Please note that I've removed the HTML entities (e.g., "
) and replaced them with their actual characters for readability.
英文:
I am trying to set an index to a nested field inside Redis to search over it easily, specifically a numeric field representing a timestamp, but I can't figure it out. The documentation is quite complicated and ever since RedisSearch was merged with main Redis, I've been struggling to find any good examples.
Here's my attempt:
import time
from redis import Redis
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.field import NumericField
from redis.commands.search.query import Query, NumericFilter
def main():
r = None
test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": str(time.time())}]}}
test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": str(time.time() + 10)}]}}
try:
r = Redis()
r.json().set("uuid:4587-7d5f9-4545", "$", test_dict1)
r.json().set("uuid:4587-7d5f9-4546", "$", test_dict2)
r.ft('timestamp').create_index(fields=(NumericField("$.messages.timestamp")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.HASH))
print(r.json().get("uuid:4587-7d5f9-4545", "$.context.test.other"))
q = Query("*").add_filter(NumericFilter(field="$.messages.timestamp", minval=0, maxval=time.time()))
print(r.ft('timestamp').search(q))
except Exception as e:
raise e
finally:
if r is not None:
r.flushall()
if __name__ == "__main__":
main()
That currently returns 0 results, but doesn't throw any errors.
答案1
得分: 1
看起来您正在为Redis中存储的哈希值创建索引(您的代码中有index_type=IndexType.HASH
),但您的数据存储在JSON文档中。尝试切换到使用index_type=IndexType.JSON
。
英文:
It looks like you are creating an index over values stored in Hashes in Redis (your code has index_type=IndexType.HASH
) but you are storing your data in JSON documents. Try swapping to using index_type=IndexType.JSON
.
答案2
得分: 1
以下是翻译好的部分:
有一些问题。首先,您的字典将时间戳作为字符串存储,而将其索引为数值。这会因类型不匹配而导致静默失败。因此,请将其替换为:
test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": time.time()}]}}
test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": time.time() + 10}]}}
其次,您的字段定义中存在拼写错误,因为实际上没有位于 $.messages.timestamp
的 JSON 键,而是位于 $.context.messages.[*].timestamp
,因此您需要更改索引定义。为了提高可读性,您可能希望为该字段添加别名。最后,正如 @simon-prickett 所说,您正在将文档索引为哈希,因此需要将其声明为 JSON 索引:
r.ft('timestamp').create_index(fields=(NumericField("$.context.messages.[*].timestamp", as_name = "ts")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.JSON))
完成后,您可以进行查询,如下所示:
q = Query("*").add_filter(NumericFilter(field="ts", minval=0, maxval=time.time()))
并获取您的结果。
英文:
There's a few problems here. First, your dictionary contains the timestamps as strings and they are indexed as numeric. That will silently fail because of the type mismatch. So, replace that with:
test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": time.time()}]}}
test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": time.time() + 10}]}}
Secondly, you've got a typo in your field definition as you don't actually have a JSON key at $.messages.timestamp
, it's at $.context.messages.[*].timestamp
so you need to change your index definition. For the sake of readability you might want to include an alias for that field. Finally, as @simon-prickett says, you are indexing the documents as hashes so you need to declare it as a JSON index:
r.ft('timestamp').create_index(fields=(NumericField("$.context.messages.[*].timestamp", as_name = "ts")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.JSON))
Once that's done you can query as
q = Query("*").add_filter(NumericFilter(field="ts", minval=0, maxval=time.time()))
and get your results.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论