“TopN aggregations on spring-data-mongodb” 可翻译为 “在spring-data-mongodb上的TopN聚合”

huangapple go评论76阅读模式
英文:

TopN aggregations on spring-data-mongodb

问题

我正在尝试使用spring-data-mongo实现使用TopN聚合操作符的分组,但我不知道如何做。

从MongoDB的角度来看,我知道我想要的是这样的:

[ { 
    $match: { 
      field000: { $regex: ".*MATCHTHIS.*" }, 
      created: { $lte: new Date("2030-05-25T00:00:00.000+00:00" ) } 
    }, 
  }, 
  { 
     $group: { 
       _id: "$field001", 
       field001s: { 
         $topN: { 
           output: ["$field002", "$created"], 
           sortBy: { created: -1 }, 
           n: 1 
         }
       }
     }
    }]

这意味着...对于已经通过$match子句过滤的文档集合;按field001分组,按created降序排序每个桶,并选择前N个(这里是1个)。因此,每个组类别的最近创建的文档。

我在将其翻译成spring-data-mongo时遇到了问题。

英文:

I am trying to implement a grouping with a TopN aggregation operator using spring-data-mongo and I am lost on how to do it.

I know what I want from the POV of MongoDB. It's something like this:

[ { 
    $match: { 
      field000: { $regex: ".*MATCHTHIS.*" }, 
      created: { $lte: new Date("2030-05-25T00:00:00.000+00:00" ) } 
    }, 
  }, 
  { 
     $group: { 
       _id: "$field001", 
       field001s: { 
         $topN: { 
           output: ["$field002", "$created"], 
           sortBy: { created: -1, }, 
           n: 1, 
         }
       }
     }
    }]

Meaning ... for the set of documents already filtered by the $match clause; group by field001, order each bucket by created desc, and pick the top (1). So the most recently created documents for each group category.

I find problems to translate this into spring-data-mongo

答案1

得分: 1

使用 MongoRepository,你可以使用 @Aggregate 注解来指定管道操作。类似于以下内容:

@Aggregation(pipeline = {"{ $match: { field000: { $regex: '?0' }, created: { $lte: '?1' } },}, { $group: { _id: '$field001', field001s: { $topN: { output: ['$field002', '$created'], sortBy: { created: -1, }, n: 1}}}}"})
Object filterAndGroup(String regex, ZonedDateTime created);

请注意,我已参数化了搜索正则表达式和日期值。请根据需要更新它,以及函数的返回类型。使用 MongoTemplate,你可以尝试类似以下内容的操作:

MatchOperation matchStage = Aggregation.match(
   new Criteria("field000").regex(".*MATCHTHIS.*")
     .and(new Criteria("created").lte(YOUR_JAVA_DATE_OBJECT))
);
ProjectionOperation projectStage = Aggregation.project("field002", "created", "field001");
SortOperation sortByCreatedDesc = sort(Sort.by(Direction.DESC, "created"));
GroupOperation groupStage = Aggregation.group("field001").first("$$ROOT").as("field001s");

Aggregation aggregation = newAggregation(
  matchStage, projectStage, sortByCreatedDesc, groupStage);
AggregationResults<XYZModel> result = mongoTemplate.aggregate(
  aggregation, "collectionName", XYZModel.class);

请注意,我已添加了两个新的阶段,用于 ProjectionSorting,因为 $topN 目前尚不受 Spring Data MongoDB 支持,所以我正在投影所需的字段,然后按 created 排序,然后对文档进行分组,并选择每个组中的第一个文档。

注意:这个答案未经过我的测试,所以你需要自行尝试并进行调整。

英文:

Using MongoRepository, you can specify the pipeline using, @Aggregate annotation. Something like this:

@Aggregation(pipeline = {&quot;{ $match: { field000: { $regex: &#39;?0&#39; }, created: { $lte: &#39;?1&#39; } },}, { $group: { _id: &#39;$field001&#39;, field001s: { $topN: { output: [&#39;$field002&#39;, &#39;$created&#39;], sortBy: { created: -1, }, n: 1}}}}&quot;})
Object filterAndGroup(String regex, ZonedDateTime created);

Note that I have parameterized the search regex and date value. Please update it accordingly, along with the return type of the function. Using MongoTemplate, you can try something along these lines.

MatchOperation matchStage = Aggregation.match(
   new Criteria(&quot;field000&quot;).regex(&quot;.*MATCHTHIS.*&quot;)
     .and(new Criteria(&quot;created&quot;).lte(YOUR_JAVA_DATE_OBJECT))
);
ProjectionOperation projectStage = Aggregation.project(&quot;field002&quot;, &quot;created&quot;, &quot;field001&quot;);
SortOperation sortByCreatedDesc = sort(Sort.by(Direction.DESC, &quot;created&quot;));
GroupOperation groupStage = Aggregation.group(&quot;field001&quot;).first(&quot;$$ROOT&quot;).as(&quot;field001s&quot;);

Aggregation aggregation = newAggregation(
  matchStage, projectStage, sortByCreatedDesc, groupStage);
AggregationResults&lt;XYZModel&gt; result = mongoTemplate.aggregate(
  aggregation, &quot;collectionName&quot;, XYZModel.class);

Note that, I have added two new stages for Projection and Sorting, as $topN is not yet supported by Spring Data MongoDB, so I am projecting the necessary fields, then sorting by created, and then grouping the document and picking the first one in each group.

Note: The answer is not tested by me, so you will have to try it out and adjust it.

答案2

得分: 0

在进行了大量研究后,我意识到在最新版本的spring-data-mongodb中,"topN"聚合操作符已经实现,这个问题可能很容易解决。然而在我的情况下,升级整个技术栈以支持那个版本并不是一个选项。

另一方面,如果你正在使用repositories,那么@Charchit Kapoor的解决方案可能是最佳的选择。

如果你想继续使用mongo模板,可以这样做:

```java
    AggregateIterable<Document> result = mongoOps.getCollection(COLLECTION_NAME).aggregate(Arrays.asList(new Document("$match",
        new Document("field000",
          new Document("$regex", ".*REGEXP_FIELD_000.*"))
          .append("created",
            new Document("$lte",
              new java.util.Date(1685318400000L))))),
      new Document("$group",
        new Document("_id", "$field001")
          .append("Field0001s",
            new Document("$topN",
              new Document("output", Arrays.asList("$field002", "$created"))
                .append("sortBy",
                  new Document("created", -1L))
                .append("n", 1L))))));

你可以将这个分成不同的方法以提高可读性。


<details>
<summary>英文:</summary>

After a lot of research, I realized this question is probably easily solvable on spring-data-mongodb most recent versions since &quot;topN&quot; aggregation operator is implemented. However in my case upgrading the stack to support that version is not an option. 

On the other hand, if you are using repositories then the @Charchit Kapoor solution is probably the best one.

If you want to stick with mongo template it can be done this way:

AggregateIterable&lt;Document&gt; result = mongoOps.getCollection(COLLECTION_NAME).aggregate(Arrays.asList(new Document(&quot;$match&quot;,
    new Document(&quot;field000&quot;,
      new Document(&quot;$regex&quot;, &quot;.*REGEXP_FIELD_000.*&quot;))
      .append(&quot;created&quot;,
        new Document(&quot;$lte&quot;,
          new java.util.Date(1685318400000L)))),
  new Document(&quot;$group&quot;,
    new Document(&quot;_id&quot;, &quot;$field001&quot;)
      .append(&quot;Field0001s&quot;,
        new Document(&quot;$topN&quot;,
          new Document(&quot;output&quot;, Arrays.asList(&quot;$field002&quot;, &quot;$created&quot;))
            .append(&quot;sortBy&quot;,
              new Document(&quot;created&quot;, -1L))
            .append(&quot;n&quot;, 1L))))));

You can split this into different methods for better readability.

</details>



huangapple
  • 本文由 发表于 2023年5月25日 21:04:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76332591.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定