英文:
How to filter buckets based on the comparison of two sub-aggregation metrics in ElasticSearch (python)?
问题
以下是代码部分的翻译:
def users_more_sentiment_posts(search_object: Search):
a = search_object.aggs.bucket(
"users",
"terms",
field="user_id"
).metric(
"positive_post_count_per_bucket",
"range",
field="post_sentiment",
ranges= [{'from': 0.0}]
).metric(
"negative_post_count_per_bucket",
"range",
field="post_sentiment",
ranges= [{'to': 0.0}]
).pipeline(
"happy_posts",
"bucket_selector",
buckets_path={
"positiveCount": "positive_post_count_per_bucket._count",
"negativeCount": "negative_post_count_per_bucket._count"
},
script="params.positiveCount > params.negativeCount"
).bucket(
"posts",
"top_hits",
size=10
)
response = search_object.execute()
希望这有所帮助。如果您有任何其他问题,请随时提出。
英文:
My index has documents with the following fields: user_id, user_name, post_text, post_sentiment
where post_sentiment is of type double, and represents the sentiment of the post. A post_sentiment greater than 0 indicates it is a happy post, while a post_sentiment lesser than 0 indicates a sad post.
I am trying to retrieve the users who have more happy posts than sad posts. I am using the ElasticSearch high-level python library.
I have created the following function, which seems correct to me logically. However, running it yields Error message: TransportError(500, 'search_phase_execution_exception'). I have made sure the problem is not with the connection or the index, but in fact with the query structure. Please indicate what I might be doing wrong here.
def users_more_sentiment_posts(search_object: Search):
a = search_object.aggs.bucket(
"users",
"terms",
field="user_id"
).metric(
"positive_post_count_per_bucket",
"range",
field="post_sentiment",
ranges= [{'from': 0.0}]
).metric(
"negative_post_count_per_bucket",
"range",
field="post_sentiment",
ranges= [{'to': 0.0}]
).pipeline(
"happy_posts",
"bucket_selector",
buckets_path={
"positiveCount": "positive_post_count_per_bucket._count",
"negativeCount": "negative_post_count_per_bucket._count"
},
script="params.positiveCount > params.negativeCount"
).bucket(
"posts",
"top_hits",
size=10
)
response = search_object.execute()
答案1
得分: 0
范围聚合返回多个桶,您需要从每个桶中获取文档计数。我已将两个范围合并为单个聚合,并添加了用于访问正数和负数范围桶的键。
def users_more_sentiment_posts(search_object: Search):
a = search_object.aggs.bucket(
"users",
"terms",
field="user_id"
).metric(
"post_count_per_bucket",
"range",
field="post_sentiment",
ranges=[{"from": 0, "key": "positive"}, {"to": 0, "key": "negative"}]
).pipeline(
"happy_posts",
"bucket_selector",
buckets_path={
"positiveCount": "post_count_per_bucket['positive']._count",
"negativeCount": "post_count_per_bucket['negative']._count"
},
script="params.positiveCount > params.negativeCount"
).bucket(
"posts",
"top_hits",
size=10
)
response = search_object.execute()
英文:
Range aggregation returns multiple buckets, and you require document count from each bucket. I have merged both ranges into single aggregation and added keys to access positive and negative range buckets.
def users_more_sentiment_posts(search_object: Search):
a = search_object.aggs.bucket(
"users",
"terms",
field="user_id"
).metric(
"post_count_per_bucket",
"range",
field="post_sentiment",
ranges=[{"from":0,"key":"positive"},{"to":0,"key":"negative"}]
).pipeline(
"happy_posts",
"bucket_selector",
buckets_path={
"positiveCount": "post_count_per_bucket['positive']._count",
"negativeCount": "post_count_per_bucket['negative']._count"
},
script="params.positiveCount > params.negativeCount"
).bucket(
"posts",
"top_hits",
size=10
)
response = search_object.execute()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论