2023年3月12日 16:54:55go评论77阅读模式

英文:

How to filter buckets based on the comparison of two sub-aggregation metrics in ElasticSearch (python)?

问题

以下是代码部分的翻译：

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "positive_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'from': 0.0}]
        ).metric(
            "negative_post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges= [{'to': 0.0}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "positive_post_count_per_bucket._count",
                "negativeCount": "negative_post_count_per_bucket._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )

    response = search_object.execute()

希望这有所帮助。如果您有任何其他问题，请随时提出。

英文:

My index has documents with the following fields: user_id, user_name, post_text, post_sentiment
where post_sentiment is of type double, and represents the sentiment of the post. A post_sentiment greater than 0 indicates it is a happy post, while a post_sentiment lesser than 0 indicates a sad post.

I am trying to retrieve the users who have more happy posts than sad posts. I am using the ElasticSearch high-level python library.

I have created the following function, which seems correct to me logically. However, running it yields Error message: TransportError(500, 'search_phase_execution_exception'). I have made sure the problem is not with the connection or the index, but in fact with the query structure. Please indicate what I might be doing wrong here.

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            &quot;users&quot;,
            &quot;terms&quot;,
            field=&quot;user_id&quot;
        ).metric(
            &quot;positive_post_count_per_bucket&quot;,
            &quot;range&quot;, 
            field=&quot;post_sentiment&quot;, 
            ranges= [{&#39;from&#39;: 0.0}]
        ).metric(
            &quot;negative_post_count_per_bucket&quot;,
            &quot;range&quot;, 
            field=&quot;post_sentiment&quot;, 
            ranges= [{&#39;to&#39;: 0.0}]
        ).pipeline(
            &quot;happy_posts&quot;,
            &quot;bucket_selector&quot;,
            buckets_path={
                &quot;positiveCount&quot;: &quot;positive_post_count_per_bucket._count&quot;,
                &quot;negativeCount&quot;: &quot;negative_post_count_per_bucket._count&quot;
            },
            script=&quot;params.positiveCount &gt; params.negativeCount&quot;
        ).bucket(
            &quot;posts&quot;,
            &quot;top_hits&quot;,
            size=10
        )

    response = search_object.execute()

答案1

得分: 0

范围聚合返回多个桶，您需要从每个桶中获取文档计数。我已将两个范围合并为单个聚合，并添加了用于访问正数和负数范围桶的键。

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            "users",
            "terms",
            field="user_id"
        ).metric(
            "post_count_per_bucket",
            "range", 
            field="post_sentiment", 
            ranges=[{"from": 0, "key": "positive"}, {"to": 0, "key": "negative"}]
        ).pipeline(
            "happy_posts",
            "bucket_selector",
            buckets_path={
                "positiveCount": "post_count_per_bucket['positive']._count",
                "negativeCount": "post_count_per_bucket['negative']._count"
            },
            script="params.positiveCount > params.negativeCount"
        ).bucket(
            "posts",
            "top_hits",
            size=10
        )
    response = search_object.execute()

英文:

Range aggregation returns multiple buckets, and you require document count from each bucket. I have merged both ranges into single aggregation and added keys to access positive and negative range buckets.

def users_more_sentiment_posts(search_object: Search):
    a = search_object.aggs.bucket(
            &quot;users&quot;,
            &quot;terms&quot;,
            field=&quot;user_id&quot;
        ).metric(
            &quot;post_count_per_bucket&quot;,
            &quot;range&quot;, 
            field=&quot;post_sentiment&quot;, 
            ranges=[{&quot;from&quot;:0,&quot;key&quot;:&quot;positive&quot;},{&quot;to&quot;:0,&quot;key&quot;:&quot;negative&quot;}]
        ).pipeline(
            &quot;happy_posts&quot;,
            &quot;bucket_selector&quot;,
            buckets_path={
                &quot;positiveCount&quot;: &quot;post_count_per_bucket[&#39;positive&#39;]._count&quot;,
                &quot;negativeCount&quot;: &quot;post_count_per_bucket[&#39;negative&#39;]._count&quot;
            },
            script=&quot;params.positiveCount &gt; params.negativeCount&quot;
        ).bucket(
            &quot;posts&quot;,
            &quot;top_hits&quot;,
            size=10
        )
    response = search_object.execute()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何基于ElasticSearch（Python）中两个子聚合指标的比较来筛选存储桶？

问题

答案1

如何使用DolphinDB Python API检查共享表是否存在？

Why am I getting "_pickle.PicklingError: Can't pickle" error while using the PythonVirtualenvOperator in airflow in GCP?

如何更改JSON中键值对中所有值的’data’？

在GitLab CI配置文件脚本中出现“折叠的多行命令错误”

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论