我应该在这个JPQL查询中包含DISTINCT吗?

huangapple go评论73阅读模式
英文:

Should I include distinct in this JPQL query?

问题

背景

我在Stack Overflow(SO)上看到了关于在JPQL JOIN FETCH查询中使用distinct关键字以及PASS_DISTINCT_THROUGH查询提示的多个答案和问题,以及在许多流行博客中的讨论。

例如,看看这两个问题:

以及这些博客文章:

我的疑问

我现在的问题是,我不能完全理解在JPQL查询中何时必须包含distinct关键字。更具体地说,我想知道这是否取决于使用哪种方法来执行查询(getResultList还是getSingleResult)。

以下是一个示例,以阐明我的意思。

从现在开始,我所写的一切都在Ubuntu Linux 18.04上进行测试,使用Java 8、Hibernate 5.4.13和一个内存中的H2数据库(版本为1.4.200)。

假设我有一个Department实体,它与一个DepartmentDirector实体具有延迟的双向一对多关系:

// Department.java
@Entity
public class Department {
    // ...
    private Set<DepartmentDirector> directors;

    @OneToMany(mappedBy = "department", fetch = FetchType.LAZY)
    public Set<DepartmentDirector> getDirectors() {
        return directors;
    }
    // ...
}

// DepartmentDirector.java
@Entity
public class DepartmentDirector {
    // ...
    private Department department;

    @ManyToOne
    @JoinColumn(name = "department_fk")
    public Department getDepartment() {
        return department;
    }
    // ...
}

假设我的数据库当前包含一个部门(department1)和两名与之关联的主管。

现在,我想通过其UUID(主键)检索部门,以及所有的主管。可以使用以下带有JOIN FETCH的JPQL查询来完成:

String query = "select department from Department department left join fetch "
             + "department.directors where department.uuid = :uuid";

由于上述查询在子集合上使用了join fetch,我预期在发出查询时会返回两个重复的部门:然而,仅当使用getResultList方法而不是使用getSingleResult方法时才会发生这种情况。这在某种程度上是合理的,但我发现Hibernate的getSingleResult实现在幕后使用了getResultList,因此我期望会抛出NonUniqueResultException异常。

我也简要地查阅了JPA 2.2规范,但没有在处理两种方法之间的重复项时提到区别,并且关于此问题的每个代码示例都使用了getResultList方法。

结论

在我的示例中,我发现使用getSingleResult执行的JOIN FETCH查询不会出现在我在背景部分链接的资源中解释的重复实体问题。

如果上述说法是正确的,那么同样的JOIN FETCH查询在使用getResultList执行时需要distinct,但在使用getSingleResult执行时不需要。

我需要有人解释一下这是否是预期行为,或者我是否误解了什么。


附录

两个查询的结果:

  1. 使用getResultList方法运行的查询。我得到了两个预期的重复部门(这仅是为了测试查询的行为,应该使用getSingleResult):

    List<Department> resultList = entityManager.createQuery(query, Department.class)
            .setParameter("uuid", department1.getUuid())
            .getResultList();
    
    assertThat(resultList).containsExactly(department1, department1); // 通过
    
  2. 使用getSingleResult方法运行的查询。我期望检索到相同的重复部门,从而抛出NonUniqueResultException异常。然而,实际上检索到一个部门,并且一切工作正常:

    Department singleResult = entityManager.createQuery(query, Department.class)
            .setParameter("uuid", department1.getUuid())
            .getSingleResult();
    
    assertThat(singleResult).isEqualTo(department1); // 通过
    
英文:

Background

I have seen multiple answers and questions here on SO and in many popular blogs about the necessity of the distinct keyword in JPQL JOIN FETCH queries and about the PASS_DISTINCT_THROUGH query hint.

For example, see these two questions

and these blog posts

What I am missing

Now my problem is that I cannot fully understand when exactly the distinct keyword must be included in the JPQL query. More specifically, if it depends on which method is used to execute the query (getResultList or getSingleResult).

The following is an example to clarify what I mean.

Everything I am writing from now on was tested on Ubuntu Linux 18.04, with Java 8, Hibernate 5.4.13 and an in-memory H2 database (version 1.4.200).

Suppose I have a Department entity which has a lazy bidirectional one to many relationship with a DepartmentDirector entity:

// Department.java
@Entity
public class Department {
    // ...
    private Set&lt;DepartmentDirector&gt; directors;

    @OneToMany(mappedBy = &quot;department&quot;, fetch = FetchType.LAZY)
    public Set&lt;DepartmentDirector&gt; getDirectors() {
        return directors;
    }
    // ...
}

// DepartmentDirector.java
@Entity
public class DepartmentDirector {
    // ...
    private Department department

    @ManyToOne
    @JoinColumn(name = &quot;department_fk&quot;)
    public Department getDepartment() {
        return department;
    }
    // ...
}

Suppose that my database currently contains one department (department1) and two directors associated with it.

Now I want to retrieve the department by its uuid (the primary key), along with all its directors. This can be done with the following JOIN FETCH JPQL query:

String query = &quot;select department from Department department left join fetch &quot;
             + &quot;department.directors where department.uuid = :uuid&quot;;

As the preceding query uses a join fetch with a child collection, I expected it to return two duplicated departments when issued: however this only happens when using the query with the getResultList method and not when using the getSingleResult method. This is somehow reasonable, but I have found that the Hibernate implementation of getSingleResult uses getResultList behind the curtains so I expected a NonUniqueResultException to be thrown.

I also briefly went through JPA 2.2 specification but no distinction in treating the duplicates between the two methods is mentioned, and every code sample concerning this issue uses the getResultList method.

Conclusion

In my example I found out that JOIN FETCH queries executed with getSingleResult do not suffer the duplicated entities problem explained in the resources I linked in the section Background.

If the above claim would be correct, it would mean that the same JOIN FETCH query would need distinct if executed with getResultList, but would not need it when executed with getSingleResult.

I need someone to explain me if this is expected or if I misunderstood something.


Appendix

Results of the two queries:

  1. Query ran with the getResultList method. I get two duplicated departments as expected (this was done just to test the behaviour of the query, getSingleResult should be used instead for this):

    List&lt;Department&gt; resultList = entityManager.createQuery(query, Department.class)
            .setParameter(&quot;uuid&quot;, department1.getUuid())
            .getResultList();
    
    assertThat(resultList).containsExactly(department1, department1); // passes
    
  2. Query ran with the getSingleResult method. I would expect the same duplicated departments to be retrieved, and thus a NonUniqueResultException to be thrown. Instead, a single department is retrieved and everything works nice:

    Department singleResult = entityManager.createQuery(query, Department.class)
            .setParameter(&quot;uuid&quot;, department1.getUuid())
            .getSingleResult();
    
    assertThat(singleResult).isEqualTo(department1); // passes
    

答案1

得分: 3

有趣的问题。

首先,我要指出getSingleResult()是针对那些由于其性质总是返回单个结果的查询而设计的(意思是:大多数聚合查询,如SELECT SUM(e.id) FROM Entity e)。基于某些业务特定规则,您认为应该返回单个结果的查询实际上并不符合条件。

话虽如此,JPA规范规定,当查询返回多于一个结果时,getSingleResult()应该抛出NonUniqueResultException异常:

当调用Query.getSingleResultTypedQuery.getSingleResult并且查询返回多个结果时,持久性提供程序会抛出NonUniqueResultException异常。此异常不会导致当前事务(如果存在)标记为回滚。

然而,看看Hibernate的实现:

@Override
public R getSingleResult() {
    try {
        final List<R> list = list();
        if ( list.size() == 0 ) {
            throw new NoResultException( "No entity found for query" );
        }
        return uniqueElement( list );
    }
    catch ( HibernateException e ) {
        if ( getProducer().getFactory().getSessionFactoryOptions().isJpaBootstrap() ) {
            throw getExceptionConverter().convert( e );
        }
        else {
            throw e;
        }
    }
}

public static <R> R uniqueElement(List<R> list) throws NonUniqueResultException {
    int size = list.size();
    if ( size == 0 ) {
        return null;
    }
    R first = list.get( 0 );
    for ( int i = 1; i < size; i++ ) {
        if ( list.get( i ) != first ) {
            throw new NonUniqueResultException( list.size() );
        }
    }
    return first;
}

事实证明,Hibernate对“多个结果”的解释似乎是“多个唯一结果”。

实际上,我对所有JPA提供程序进行了测试,结果表明:

  • Hibernate确实从getResultList()返回重复项,但由于getSingleResult()的实现方式特殊,不会抛出异常。
  • EclipseLink是唯一一个在getResultList()中不会出现重复结果问题的提供程序,因此getSingleResult()也不会抛出异常(对我来说,这种行为是合乎逻辑的,但事实证明,这完全是一种解释方式)。
  • OpenJPA和DataNucleus都会从getResultList()返回重复结果,并且会从getSingleResult()抛出异常。

总之,我需要有人向我解释这是否是预期的情况,或者我是否误解了某些内容。

这实际上归结为您如何解释规范。

英文:

Interesting question.

First of all let me point out that getSingleResult() was meant for queries that due to their nature always return a single result (meaning: mostly aggregate queries like SELECT SUM(e.id) FROM Entity e). A query that you think, based on some business domain-specific rule, should return a single result, does not really qualify.

That being said, the JPA Spec states that getSingleResult() should throw NonUniqueResultException when the query returns more than one result:

> The NonUniqueResultException is thrown by the persistence provider when Query.getSingleResult or TypedQuery.getSingleResult is invoked and there is more than one result from the query. This exception will not cause the current transaction, if one is active, to be marked for rollback.

However, looking at the Hibernate implementation:

    @Override
	public R getSingleResult() {
		try {
			final List&lt;R&gt; list = list();
			if ( list.size() == 0 ) {
				throw new NoResultException( &quot;No entity found for query&quot; );
			}
			return uniqueElement( list );
		}
		catch ( HibernateException e ) {
			if ( getProducer().getFactory().getSessionFactoryOptions().isJpaBootstrap() ) {
				throw getExceptionConverter().convert( e );
			}
			else {
				throw e;
			}
		}
	}

	public static &lt;R&gt; R uniqueElement(List&lt;R&gt; list) throws NonUniqueResultException {
		int size = list.size();
		if ( size == 0 ) {
			return null;
		}
		R first = list.get( 0 );
		for ( int i = 1; i &lt; size; i++ ) {
			if ( list.get( i ) != first ) {
				throw new NonUniqueResultException( list.size() );
			}
		}
		return first;
	}

it turns out Hibernate's interpretation of 'more than one result' seems to be 'more than one unique result'.

In fact, I tested your scenario with all JPA providers, and it turns out that:

  • Hibernate does indeed return duplicates from getResultList(), but does not throw the exception due to the peculiar way getSingleResult() is implemented
  • EclipseLink is the only one that does not suffer from the duplicate result bug in getResultList() and consequently, getSingleResult() does not throw an exception, either (to me, this behaviour is only logical, but as it turns out, it is all a matter of interpretation)
  • OpenJPA and DataNucleus both return duplicate results from getResultList() and throw an exception from getSingleResult()

Tl;DR

> I need someone to explain me if this is expected or if I misunderstood something.

It really boils down to how you interpret the specification

huangapple
  • 本文由 发表于 2020年8月14日 18:05:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/63410722.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定