英文:
How to know the missing items from Spring Data JPA's findAllById method in an efficient way?
问题
考虑以下代码片段:
List<String> usersList = Arrays.asList("john", "jack", "jill", "xxxx", "yyyy");
List<User> userEntities = userRepo.findAllById(usersList);
User
类是一个简单的实体对象,带有 @Entity
注解,并且具有一个 String 数据类型的 @Id
字段。
假设在数据库中我有对应于 “john”、“jack” 和 “jill”
的行。即使我在 usersList
中传递了 5 个项(以及 “xxxx” 和 “yyyy”
),findAllById
方法也只会返回与 “john”、“jack” 和 “jill”
相对应的 3 个项/实体。
现在在调用了 findAllById
方法之后,最佳、简便且高效的方法(可能比 O(n^2) 更好)是找出 findAllById
方法没有返回的缺失项?(在这种情况下,它将是 “xxxx” 和 “yyyy”
)。
英文:
Consider this code snippet below:
List<String> usersList = Arrays.asList("john", "jack", "jill", "xxxx", "yyyy");
List<User> userEntities = userRepo.findAllById(usersList);
User
class is a simple Entity object annotated with @Entity
and has an @Id
field which is of String datatype.
Assume that in db I have rows corresponding to "john", "jack" and "jill"
. Even though I passed 5 items in usersList
(along with "xxxx" and "yyyy"
), findAllById
method would only return 3 items/entities corresponding to "john","jack",and "jill"
.
Now after the call to findAllById
method, what's the best, easy and efficient(better than O(n^2) perhaps) way to find out the missing items which findAllById
method did not return?(In this case, it would be "xxxx" and "yyyy"
).
答案1
得分: 3
使用Java Sets
您可以使用集合作为过滤的源:
Set<String> usersSet = new HashSet<>(Arrays.asList("john", "jack", "jill", "xxxx", "yyyy"));
现在,您可以创建一个谓词来过滤那些不存在的项:
Set<String> foundIds = userRepo.findAllById(usersSet)
.stream()
.map(User::getId)
.collect(Collectors.toSet());
我假设过滤应该是O(n),以遍历整个结果。
或者,您可以更改存储库以使用一种形式的distinct
子句返回一组用户:
Set<String> foundIds = userRepo.findDistinctById(usersSet)
.stream()
.map(User::getId)
.collect(Collectors.toSet());
然后,您可以只应用一个集合操作:
usersSet.removeAll(foundIds);
现在,usersSet
包含了在您的结果中未找到的用户。
集合具有O(1)的复杂度来查找项目。所以,我假设这应该是O(userSet的大小)来删除它们。
或者,您可以遍历foundIds
,逐渐从userSet
中删除项目。然后,在您意识到没有更多的userSet
项目可删除时(即集合为空),您可以中断循环算法。
直接从数据库中进行过滤
现在,为了避免所有这些,您可以可能定义一个本地查询并在您的JPA存储库中运行它,以仅检索数据库中不存在的用户列表中的用户。查询可能如下所示,我在PostgreSQL中执行了类似的操作:
WITH my_users AS(
SELECT 'john' AS id UNION SELECT 'jack' UNION SELECT 'jill'
)
SELECT id FROM my_users mu
WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id);
Spring Data: JDBC示例
由于查询是动态的(即过滤集合可能每次都不同),我们需要动态构建查询。我不认为JPA有一种方法可以做到这一点,但本地查询可能会奏效。
您可以直接将JdbcTemplate
查询打包到您的存储库中,或者手动使用JPA本地查询。
@Repository
public class UserRepository {
private final JdbcTemplate jdbcTemplate;
public UserRepository(JdbcTemplate jdbcTemplate) {this.jdbcTemplate = jdbcTemplate;}
public Set<String> getUserIdNotFound(Set<String> userIds) {
StringBuilder sql = new StringBuilder();
for(String userId : userIds) {
if(sql.length() > 0) {
sql.append(" UNION ");
}
sql.append("SELECT ? AS id");
}
String query = String.format("WITH my_users AS (%s)", sql) +
"SELECT id FROM my_users mu WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id)";
List<String> myUsers = jdbcTemplate.queryForList(query, userIds.toArray(), String.class);
return new HashSet<>(myUsers);
}
}
然后我们只需执行:
Set<String> usersIds = Set.of("john", "jack", "jill", "xxxx", "yyyy");
Set<String> notFoundIds = userRepo.getUserIdNotFound(usersIds);
可能有一种方法可以使用JPA本地查询来实现。让我看看是否可以使用其中一种方法,并稍后将其包含在答案中。
英文:
Using Java Sets
You could use a set as the source of filtering:
Set<String> usersSet = new HashSet<>(Arrays.asList("john", "jack", "jill", "xxxx", "yyyy"));
And now you could create a predicate to filter those not present:
Set<String> foundIds = userRepo.findAllById(usersSet)
.stream()
.map(User::getId)
.collect(Collectors.toSet());
I assume the filter should be O(n) to go over the entire results.
Or you could change your repository to return a set of users ideally using a form of distinct
clause:
Set<String> foundIds = userRepo.findDistinctById(usersSet)
.stream()
.map(User::getId)
.collect(Collectors.toSet());;
And then you can just apply a set operator:
usersSet.removeAll(foundIds);
And now usersSet
contains the users not found in your result.
And a set has a O(1) complexity to find an item. So, I assume this should be O(sizeOf(userSet)) to remove them all.
Alternatively, you could iterate over the foundIds
and gradually remove items from the userSet
. Then you could short-circuit the loop algorithm in the event you realize that there are no more userSet
items to remove (i.e. the set is empty).
Filtering Directly from Database
Now to avoid all this, you can probably define a native query and run it in your JPA repository to retrieve only users from your list which didn't exist in the database. The query would be somewhat as follows that I did in PostgreSQL:
WITH my_users AS(
SELECT 'john' AS id UNION SELECT 'jack' UNION SELECT 'jill'
)
SELECT id FROM my_users mu
WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id);
Spring Data: JDBC Example
Since the query is dynamic (i.e. the filtering set could be of different sizes every time), we need to build the query dynamically. And I don't believe JPA has a way to do this, but a native query might do the trick.
You could either pack a JdbcTemplate
query directly into your repository or use JPA native queries manually.
@Repository
public class UserRepository {
private final JdbcTemplate jdbcTemplate;
public UserRepository(JdbcTemplate jdbcTemplate) {this.jdbcTemplate = jdbcTemplate;}
public Set<String> getUserIdNotFound(Set<String> userIds) {
StringBuilder sql = new StringBuilder();
for(String userId : userIds) {
if(sql.length() > 0) {
sql.append(" UNION ");
}
sql.append("SELECT ? AS id");
}
String query = String.format("WITH my_users AS (%sql)", sql) +
"SELECT id FROM my_users mu WHERE NOT EXISTS(SELECT 1 FROM users u WHERE u.id = mu.id)";
List<String> myUsers = jdbcTemplate.queryForList(query, userIds.toArray(), String.class);
return new HashSet<>(myUsers);
}
}
Then we just do:
Set<String> usersIds = Set.of("john", "jack", "jill", "xxxx", "yyyy");
Set<String> notFoundIds = userRepo.getUserIdNotFound(usersIds);
There is probably a way to do it with JPA native queries. Let me see if I can do one of those and put it in the answer later on.
答案2
得分: 1
你可以编写自己的算法来查找缺失的用户。例如:
List<String> missing = new ArrayList<>(usersList);
for (User user : userEntities) {
String userId = user.getId();
missing.remove(userId);
}
最终结果将会得到一个包含缺失用户ID的列表:
"xxxx"
和 "yyyy"
英文:
You can write your own algorithm that finds missing users. For example:
List<String> missing = new ArrayList<>(usersList);
for (User user : userEntities){
String userId = user.getId();
missing.remove(userId);
}
In the result you will have a list of user-ids that are missing:
"xxxx"
and "yyyy"
答案3
得分: 0
你可以只需在你的仓库中添加一个方法:
findByIdNotIn(Collection<String> ids)
,Spring 将会生成查询语句:
详见这里:
https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#jpa.query-methods
注意(来自文档):
In 和 NotIn 也可以接受 Collection 的任何子类作为参数,同时也支持数组或可变参数。
英文:
You can just add a method to your repo:
findByIdNotIn(Collection<String> ids)
and Spring will make the query:
See here:
https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#jpa.query-methods
Note (from the docs):
>> In and NotIn also take any subclass of Collection as aparameter as well as arrays or varargs.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论