如何基于ElasticSearch(Python)中两个子聚合指标的比较来筛选存储桶?

huangapple go评论61阅读模式
英文:

How to filter buckets based on the comparison of two sub-aggregation metrics in ElasticSearch (python)?

问题

以下是代码部分的翻译:

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "positive_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'from': 0.0}]
        ).metric(
            "negative_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'to': 0.0}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "positive_post_count_per_bucket._count",
                "negativeCount": "negative_post_count_per_bucket._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )

    response = search_object.execute()

希望这有所帮助。如果您有任何其他问题,请随时提出。

英文:

My index has documents with the following fields: user_id, user_name, post_text, post_sentiment
where post_sentiment is of type double, and represents the sentiment of the post. A post_sentiment greater than 0 indicates it is a happy post, while a post_sentiment lesser than 0 indicates a sad post.

I am trying to retrieve the users who have more happy posts than sad posts. I am using the ElasticSearch high-level python library.

I have created the following function, which seems correct to me logically. However, running it yields Error message: TransportError(500, 'search_phase_execution_exception'). I have made sure the problem is not with the connection or the index, but in fact with the query structure. Please indicate what I might be doing wrong here.

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "positive_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'from': 0.0}]
        ).metric(
            "negative_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'to': 0.0}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "positive_post_count_per_bucket._count",
                "negativeCount": "negative_post_count_per_bucket._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )

    response = search_object.execute()

答案1

得分: 0

范围聚合返回多个桶,您需要从每个桶中获取文档计数。我已将两个范围合并为单个聚合,并添加了用于访问正数和负数范围桶的键。

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges=[{"from": 0, "key": "positive"}, {"to": 0, "key": "negative"}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "post_count_per_bucket['positive']._count",
                "negativeCount": "post_count_per_bucket['negative']._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )
    response = search_object.execute()
英文:

Range aggregation returns multiple buckets, and you require document count from each bucket. I have merged both ranges into single aggregation and added keys to access positive and negative range buckets.

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges=[{"from":0,"key":"positive"},{"to":0,"key":"negative"}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "post_count_per_bucket['positive']._count",
                "negativeCount": "post_count_per_bucket['negative']._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )
    response = search_object.execute()

huangapple
  • 本文由 发表于 2023年3月12日 16:54:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75712020.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定