2020年10月10日 03:49:13go评论95阅读模式

英文:

How to know the missing items from Spring Data JPA's findAllById method in an efficient way?

问题

考虑以下代码片段：

List<String> usersList = Arrays.asList("john", "jack", "jill", "xxxx", "yyyy");
List<User> userEntities = userRepo.findAllById(usersList);

User 类是一个简单的实体对象，带有 @Entity 注解，并且具有一个 String 数据类型的 @Id 字段。

假设在数据库中我有对应于 “john”、“jack” 和 “jill” 的行。即使我在 usersList 中传递了 5 个项（以及 “xxxx” 和 “yyyy”），findAllById 方法也只会返回与 “john”、“jack” 和 “jill” 相对应的 3 个项/实体。

现在在调用了 findAllById 方法之后，最佳、简便且高效的方法（可能比 O(n^2) 更好）是找出 findAllById 方法没有返回的缺失项？（在这种情况下，它将是 “xxxx” 和 “yyyy”）。

英文:

Consider this code snippet below:

List&lt;String&gt; usersList = Arrays.asList(&quot;john&quot;, &quot;jack&quot;, &quot;jill&quot;, &quot;xxxx&quot;, &quot;yyyy&quot;);
List&lt;User&gt; userEntities = userRepo.findAllById(usersList);

User class is a simple Entity object annotated with @Entity and has an @Id field which is of String datatype.

Assume that in db I have rows corresponding to "john", "jack" and "jill". Even though I passed 5 items in usersList(along with "xxxx" and "yyyy"), findAllById method would only return 3 items/entities corresponding to "john","jack",and "jill".

Now after the call to findAllById method, what's the best, easy and efficient(better than O(n^2) perhaps) way to find out the missing items which findAllById method did not return?(In this case, it would be "xxxx" and "yyyy").

答案1

得分: 3

使用Java Sets

您可以使用集合作为过滤的源：

Set<String> usersSet = new HashSet<>(Arrays.asList("john", "jack", "jill", "xxxx", "yyyy"));

现在，您可以创建一个谓词来过滤那些不存在的项：

Set<String> foundIds = userRepo.findAllById(usersSet)
     .stream()
     .map(User::getId)
     .collect(Collectors.toSet());

我假设过滤应该是O(n)，以遍历整个结果。

或者，您可以更改存储库以使用一种形式的distinct子句返回一组用户：

Set<String> foundIds = userRepo.findDistinctById(usersSet)
       .stream()
       .map(User::getId)
       .collect(Collectors.toSet());

然后，您可以只应用一个集合操作：

usersSet.removeAll(foundIds);

现在，usersSet 包含了在您的结果中未找到的用户。

集合具有O(1)的复杂度来查找项目。所以，我假设这应该是O(userSet的大小)来删除它们。

或者，您可以遍历foundIds，逐渐从userSet中删除项目。然后，在您意识到没有更多的userSet项目可删除时（即集合为空），您可以中断循环算法。

直接从数据库中进行过滤

现在，为了避免所有这些，您可以可能定义一个本地查询并在您的JPA存储库中运行它，以仅检索数据库中不存在的用户列表中的用户。查询可能如下所示，我在PostgreSQL中执行了类似的操作：

WITH my_users AS(
   SELECT 'john' AS id UNION SELECT 'jack' UNION SELECT 'jill'
) 
SELECT id FROM my_users mu 
WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id);

Spring Data: JDBC示例

由于查询是动态的（即过滤集合可能每次都不同），我们需要动态构建查询。我不认为JPA有一种方法可以做到这一点，但本地查询可能会奏效。

您可以直接将JdbcTemplate查询打包到您的存储库中，或者手动使用JPA本地查询。

@Repository
public class UserRepository {
    
    private final JdbcTemplate jdbcTemplate;

    public UserRepository(JdbcTemplate jdbcTemplate) {this.jdbcTemplate = jdbcTemplate;}

    public Set<String> getUserIdNotFound(Set<String> userIds) {
        StringBuilder sql = new StringBuilder();
        for(String userId : userIds) {
            if(sql.length() > 0) {
                sql.append(" UNION ");
            }
            sql.append("SELECT ? AS id");
        }

        String query = String.format("WITH my_users AS (%s)", sql) +
                "SELECT id FROM my_users mu WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id)";

        List<String> myUsers = jdbcTemplate.queryForList(query, userIds.toArray(), String.class);

        return new HashSet<>(myUsers);
    }

}

然后我们只需执行：

Set<String> usersIds = Set.of("john", "jack", "jill", "xxxx", "yyyy");
Set<String> notFoundIds = userRepo.getUserIdNotFound(usersIds);

可能有一种方法可以使用JPA本地查询来实现。让我看看是否可以使用其中一种方法，并稍后将其包含在答案中。

英文:

Using Java Sets

You could use a set as the source of filtering:

Set&lt;String&gt; usersSet = new HashSet&lt;&gt;(Arrays.asList(&quot;john&quot;, &quot;jack&quot;, &quot;jill&quot;, &quot;xxxx&quot;, &quot;yyyy&quot;));

And now you could create a predicate to filter those not present:

Set&lt;String&gt; foundIds = userRepo.findAllById(usersSet)
     .stream()
     .map(User::getId)
     .collect(Collectors.toSet());

I assume the filter should be O(n) to go over the entire results.

Or you could change your repository to return a set of users ideally using a form of distinct clause:

Set&lt;String&gt; foundIds = userRepo.findDistinctById(usersSet)
       .stream()
       .map(User::getId)
       .collect(Collectors.toSet());;

And then you can just apply a set operator:

usersSet.removeAll(foundIds);

And now usersSet contains the users not found in your result.

And a set has a O(1) complexity to find an item. So, I assume this should be O(sizeOf(userSet)) to remove them all.

Alternatively, you could iterate over the foundIds and gradually remove items from the userSet. Then you could short-circuit the loop algorithm in the event you realize that there are no more userSet items to remove (i.e. the set is empty).

Filtering Directly from Database

Now to avoid all this, you can probably define a native query and run it in your JPA repository to retrieve only users from your list which didn't exist in the database. The query would be somewhat as follows that I did in PostgreSQL:

WITH my_users AS(
   SELECT &#39;john&#39; AS id UNION SELECT &#39;jack&#39; UNION SELECT &#39;jill&#39;
) 
SELECT id FROM my_users mu 
WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id);

Spring Data: JDBC Example

Since the query is dynamic (i.e. the filtering set could be of different sizes every time), we need to build the query dynamically. And I don't believe JPA has a way to do this, but a native query might do the trick.

You could either pack a JdbcTemplate query directly into your repository or use JPA native queries manually.

@Repository
public class UserRepository {
    
    private final JdbcTemplate jdbcTemplate;

    public UserRepository(JdbcTemplate jdbcTemplate) {this.jdbcTemplate = jdbcTemplate;}

    public Set&lt;String&gt; getUserIdNotFound(Set&lt;String&gt; userIds) {
        StringBuilder sql = new StringBuilder();
        for(String userId : userIds) {
            if(sql.length() &gt; 0) {
                sql.append(&quot; UNION &quot;);
            }
            sql.append(&quot;SELECT ? AS id&quot;);
        }

        String query = String.format(&quot;WITH my_users AS (%sql)&quot;, sql) +
                &quot;SELECT id FROM my_users mu WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id)&quot;;

        List&lt;String&gt; myUsers = jdbcTemplate.queryForList(query, userIds.toArray(), String.class);

        return new HashSet&lt;&gt;(myUsers);
    }

}

Then we just do:

Set&lt;String&gt; usersIds = Set.of(&quot;john&quot;, &quot;jack&quot;, &quot;jill&quot;, &quot;xxxx&quot;, &quot;yyyy&quot;);
Set&lt;String&gt; notFoundIds = userRepo.getUserIdNotFound(usersIds);

There is probably a way to do it with JPA native queries. Let me see if I can do one of those and put it in the answer later on.

答案2

得分: 1

你可以编写自己的算法来查找缺失的用户。例如：

List<String> missing = new ArrayList<>(usersList);

for (User user : userEntities) {
    String userId = user.getId();
    missing.remove(userId);
}

最终结果将会得到一个包含缺失用户ID的列表：

"xxxx" 和 "yyyy"

英文:

You can write your own algorithm that finds missing users. For example:

List&lt;String&gt; missing = new ArrayList&lt;&gt;(usersList);

        for (User user : userEntities){
            String userId = user.getId();
            missing.remove(userId);
        }

In the result you will have a list of user-ids that are missing:

"xxxx" and "yyyy"

答案3

得分: 0

你可以只需在你的仓库中添加一个方法：

findByIdNotIn(Collection<String> ids)，Spring 将会生成查询语句：

详见这里：

https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#jpa.query-methods

注意（来自文档）：

In 和 NotIn 也可以接受 Collection 的任何子类作为参数，同时也支持数组或可变参数。

英文:

You can just add a method to your repo:

findByIdNotIn(Collection<String> ids) and Spring will make the query:

See here:

https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#jpa.query-methods

Note (from the docs):

>> In and NotIn also take any subclass of Collection as aparameter as well as arrays or varargs.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何以高效的方式从Spring Data JPA的findAllById方法中确定缺失的项目？

问题

答案1

Using Java Sets

Filtering Directly from Database

Spring Data: JDBC Example

答案2

答案3

这个二叉树的中序遍历有什么问题？

尝试从自动生成的电子邮件中提取密码。但在各种组合中都失败了。

不同的构造函数（LOMBOK）

Eclipse安装错误：发生错误。请参阅日志文件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论