Hibernate搜索与排序和排序

huangapple go评论111阅读模式
英文:

Hibernate search sorting with collation

问题

我将Hibernate Search从版本4.3.0.Final升级到最新的稳定版本5.4.12.Final。除了挪威语单词的排序之外,一切都很好。在旧版本的Hibernate中,SortField的构造函数中有一个区域设置(locale)参数:

  1. /** Creates a sort, possibly in reverse, by terms in the given field sorted
  2. * according to the given locale.
  3. * @param field Name of field to sort by, cannot be <code>null</code>.
  4. * @param locale Locale of values in the field.
  5. */
  6. public SortField (String field, Locale locale, boolean reverse) {
  7. initFieldType(field, STRING);
  8. this.locale = locale;
  9. this.reverse = reverse;
  10. }

但是在新版本的Hibernate Search中,SortField没有区域设置参数。根据Hibernate参考文档(https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_analysis),为了对外语单词进行排序,我们应该使用带有规范化器的CollationKeyFilterFactory。但在这个版本的Hibernate Search中没有这样的类。Maven的pom.xml文件:

  1. <dependency>
  2. <groupId>org.hibernate</groupId>
  3. <artifactId>hibernate-search-orm</artifactId>
  4. <version>5.11.5.Final</version>
  5. </dependency>

问题是:我应该在Hibernate Search中使用/创建什么来对挪威语单词进行排序?

现在我的排序顺序是:

  1. > atest, btest, ctest, ztest, åtest, ætest, øtest

正确的顺序是:

  1. > atest, btest, ctest, ztest, ætest, øtest, åtest

有一个CollationKeyAnalyzer类,但我不知道如何将其用于排序:

  1. public final class CollationKeyAnalyzer extends Analyzer {
  2. private final CollationAttributeFactory factory;
  3. /**
  4. * Create a new CollationKeyAnalyzer, using the specified collator.
  5. *
  6. * @param collator CollationKey generator
  7. */
  8. public CollationKeyAnalyzer(Collator collator) {
  9. this.factory = new CollationAttributeFactory(collator);
  10. }
  11. @Override
  12. protected TokenStreamComponents createComponents(String fieldName) {
  13. KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
  14. return new TokenStreamComponents(tokenizer, tokenizer);
  15. }
  16. }

非常相似但没有答案的问题:https://stackoverflow.com/questions/39264308/how-to-do-case-insensitive-sorting-of-norwegian-characters-%c3%86-%c3%98-and-%c3%85-using-h

英文:

I upgraded Hibernate search from version - 4.3.0.Final to the latest stable version - 5.4.12.Final. All is good except sorting norwegian words. In the old version of hibernate there was SortField with locale in the constructor:

  1. /** Creates a sort, possibly in reverse, by terms in the given field sorted
  2. * according to the given locale.
  3. * @param field Name of field to sort by, cannot be &lt;code&gt;null&lt;/code&gt;.
  4. * @param locale Locale of values in the field.
  5. */
  6. public SortField (String field, Locale locale, boolean reverse) {
  7. initFieldType(field, STRING);
  8. this.locale = locale;
  9. this.reverse = reverse;
  10. }

But in the new hibernate search SortField does not have locale. According to hibernate reference documentation (https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_analysis) for sort words words in foreign languages we should use CollationKeyFilterFactory with normalizer. But there is no such class in this version of hibernate search. Maven pom:

  1. &lt;dependency&gt;
  2. &lt;groupId&gt;org.hibernate&lt;/groupId&gt;
  3. &lt;artifactId&gt;hibernate-search-orm&lt;/artifactId&gt;
  4. &lt;version&gt;5.11.5.Final&lt;/version&gt;
  5. &lt;/dependency&gt;

The question: What should I use/create in the hibernate search for sort norwegian words?

Now I have such sort order:

> atest, btest, ctest, ztest, åtest, ætest, øtest

The correct order:

> atest, btest, ctest, ztest, ætest, øtest, åtest

There is CollationKeyAnalyzer class, but I do not know how to use this for sorting:

  1. public final class CollationKeyAnalyzer extends Analyzer {
  2. private final CollationAttributeFactory factory;
  3. /**
  4. * Create a new CollationKeyAnalyzer, using the specified collator.
  5. *
  6. * @param collator CollationKey generator
  7. */
  8. public CollationKeyAnalyzer(Collator collator) {
  9. this.factory = new CollationAttributeFactory(collator);
  10. }
  11. @Override
  12. protected TokenStreamComponents createComponents(String fieldName) {
  13. KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
  14. return new TokenStreamComponents(tokenizer, tokenizer);
  15. }
  16. }

Very similar question without answer: https://stackoverflow.com/questions/39264308/how-to-do-case-insensitive-sorting-of-norwegian-characters-%c3%86-%c3%98-and-%c3%85-using-h

答案1

得分: 1

我不确定它对您有多大帮助,但 CollationKeyFilterFactory 已被弃用并且确实被移除。

在该类的 Javadoc 中写道:

>已弃用。
>请使用 CollationKeyAnalyzer 代替。

您可以在此处找到 Javadoc

英文:

I'm not sure how much it helps you but the CollationKeyFilterFactory was deprecated and indeed removed.

In the class' Javadoc it says:

>Deprecated.
>use CollationKeyAnalyzer instead.

You can find the Javadoc here.

答案2

得分: 1

> 但是在这个版本的 hibernate search 中没有这样的类。

这部分文档看起来已经过时了,我会查看并更新它。

我找到了 CollationKeyAnalyzer,但是 javadoc 表明它已经过时了,应该使用 ICUCollationKeyAnalyzer 代替。

尝试将这个依赖项添加到你的 POM 文件中:

  1. <dependency>
  2. <groupId>org.apache.lucene</groupId>
  3. <artifactId>lucene-analyzers-icu</artifactId>
  4. <version>5.5.5</version>
  5. </dependency>

然后创建一个自定义的分析器类,重新实现 ICUCollationKeyAnalyzer 并使用硬编码的区域设置:

  1. public class MyCollationKeyAnalyzer extends Analyzer {
  2. private final ICUCollationAttributeFactory factory;
  3. public MyCollationKeyAnalyzer(Version luceneVersion) {
  4. this.factory = new ICUCollationAttributeFactory( Collator.getInstance( Locale.getInstance( "nb_NO" ) ) );
  5. }
  6. @Override
  7. protected TokenStreamComponents createComponents(String fieldName) {
  8. KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
  9. return new TokenStreamComponents(tokenizer, tokenizer);
  10. }
  11. }

然后创建你的字段:

  1. @Entity
  2. @Indexed
  3. public class MyEntity {
  4. // ...
  5. @Field(name = "title_sort", index = Index.NO, normalizer = @Normalizer(impl = MyCollationKeyAnalyzer.class))
  6. @SortableField(forField = "title_sort")
  7. private String title;
  8. // ...
  9. }

然后像这样在该字段上进行排序:

  1. FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);
  2. QueryBuilder qb = ...; // 通常的创建方式
  3. Query luceneQuery = ...; // 通常的创建方式
  4. FullTextQuery ftQuery = ftEm.createFullTextQuery(luceneQuery, MyEntity.class);
  5. ftQuery.setSort(qb.sort().byField("title_sort").createSort());
  6. ftQuery.setMaxResults(20);
  7. List<MyEntity> hits = ftQuery.getResultList();

我没有尝试过这个,所以如果对你有用的话,请告诉我们。

英文:

> But there is no such class in this version of hibernate search.

This part of the documentation looks obsolete, I'll look into updating it.

I found CollationKeyAnalyzer, but the javadoc states that it's obsolete and that ICUCollationKeyAnalyzer should be used instead.

Try adding this dependency to your POM:

  1. &lt;dependency&gt;
  2. &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;
  3. &lt;artifactId&gt;lucene-analyzers-icu&lt;/artifactId&gt;
  4. &lt;version&gt;5.5.5&lt;/version&gt;
  5. &lt;/dependency&gt;

Then create your own analyzer class that re-implements ICUCollationKeyAnalyzer with a hard-coded locale:

  1. public class MyCollationKeyAnalyzer extends Analyzer {
  2. private final ICUCollationAttributeFactory factory;
  3. public MyCollationKeyAnalyzer(Version luceneVersion) {
  4. this.factory = new ICUCollationAttributeFactory( Collactor.getInstance( Locale.getInstance( &quot;nb_NO&quot; ) ) );
  5. }
  6. @Override
  7. protected TokenStreamComponents createComponents(String fieldName) {
  8. KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
  9. return new TokenStreamComponents(tokenizer, tokenizer);
  10. }
  11. }

Then create your field:

  1. @Entity
  2. @Indexed
  3. public class MyEntity {
  4. // ...
  5. @Field(name = &quot;title_sort&quot;, index = Index.NO, normalizer = @Normalizer(impl = MyCollationKeyAnalyzer.class))
  6. @SortableField(forField = &quot;title_sort&quot;)
  7. private String title;
  8. // ...
  9. }

Then sort on that field like this:

  1. FullTextEntityManager ftEm = Search.getFullTextEntityManager( entityManager );
  2. QueryBuilder qb = ...; // The usual
  3. Query luceneQuery = ...; // The usual
  4. FullTextQuery ftQuery = ftEm.createFullTextQuery( luceneQuery, MyEntity.class );
  5. ftQuery.setSort( qb.sort().byField( &quot;title_sort&quot; ).createSort() );
  6. ftQuery.setMaxResults( 20 );
  7. List&lt;MyEntity&gt; hits = ftQuery.getResultList();

I didn't try this though, so let us know if it worked for you.

答案3

得分: 1

为了解决排序问题,我创建了自己的NorwegianCollationFactory。尽管这不是完美的解决方案,因为我从旧版本的Hibernate Search(IndexableBinaryStringTools.class)中复制了代码,但它能正常工作。

NorwegianCollationFactory类

  1. import org.apache.lucene.analysis.TokenStream;
  2. import org.apache.lucene.analysis.util.TokenFilterFactory;
  3. import java.text.Collator;
  4. import java.util.Locale;
  5. import java.util.Map;
  6. public final class NorwegianCollationFactory extends TokenFilterFactory {
  7. public NorwegianCollationFactory(Map<String, String> args) {
  8. super(args);
  9. }
  10. @Override
  11. public TokenStream create(TokenStream input) {
  12. Collator norwegianCollator = Collator.getInstance(new Locale("no", "NO"));
  13. return new CollationKeyFilter(input, norwegianCollator);
  14. }
  15. }

CollationKeyFilter类

  1. import org.apache.lucene.analysis.TokenFilter;
  2. import org.apache.lucene.analysis.TokenStream;
  3. import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
  4. import java.io.IOException;
  5. import java.text.Collator;
  6. import java.util.Objects;
  7. public final class CollationKeyFilter extends TokenFilter {
  8. // 这段代码是从旧版本的Hibernate Search 4.3.0.Final的IndexableBinaryStringTools.class中复制的
  9. // ...
  10. //(以下代码省略,因为长度较长)
  11. }

Entity映射示例

  1. @Entity
  2. @NormalizerDef(name = "textSortNormalizer",
  3. filters = {
  4. // ...(以下代码省略,因为长度较长)
  5. @TokenFilterDef(factory = NorwegianCollationFactory.class)
  6. }
  7. )
  8. public class Entity {
  9. @Field(name = "name_for_sort", normalizer = @Normalizer(definition = "textSortNormalizer"))
  10. @SortableField(forField = "name_for_sort")
  11. private String name;
  12. }

(以上代码已省略,因为长度较长)

英文:

In order to fix sorting I created my own NorwegianCollationFactory. It is not perfect solution as I copied code from old version of Hibernate Search (IndexableBinaryStringTools.class), but it is working fine.<br>
NorwegianCollationFactory class:

  1. import org.apache.lucene.analysis.TokenStream;
  2. import org.apache.lucene.analysis.util.TokenFilterFactory;
  3. import java.text.Collator;
  4. import java.util.Locale;
  5. import java.util.Map;
  6. public final class NorwegianCollationFactory extends TokenFilterFactory {
  7. public NorwegianCollationFactory(Map&lt;String, String&gt; args) {
  8. super(args);
  9. }
  10. @Override
  11. public TokenStream create(TokenStream input) {
  12. Collator norwegianCollator = Collator.getInstance(new Locale(&quot;no&quot;, &quot;NO&quot;));
  13. return new CollationKeyFilter(input, norwegianCollator);
  14. }
  15. }

CollationKeyFilter class:

  1. import org.apache.lucene.analysis.TokenFilter;
  2. import org.apache.lucene.analysis.TokenStream;
  3. import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
  4. import java.io.IOException;
  5. import java.text.Collator;
  6. import java.util.Objects;
  7. public final class CollationKeyFilter extends TokenFilter {
  8. // This code is copied from IndexableBinaryStringTools.class from the old version of hibernate search 4.3.0.Final
  9. private static final CollationKeyFilter.CodingCase[] CODING_CASES = {
  10. new CollationKeyFilter.CodingCase(7, 1),
  11. new CollationKeyFilter.CodingCase(14, 6, 2),
  12. new CollationKeyFilter.CodingCase(13, 5, 3),
  13. new CollationKeyFilter.CodingCase(12, 4, 4),
  14. new CollationKeyFilter.CodingCase(11, 3, 5),
  15. new CollationKeyFilter.CodingCase(10, 2, 6),
  16. new CollationKeyFilter.CodingCase(9, 1, 7),
  17. new CollationKeyFilter.CodingCase(8, 0)
  18. };
  19. private final Collator collator;
  20. private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
  21. public CollationKeyFilter(TokenStream input, Collator collator) {
  22. super(input);
  23. this.collator = (Collator) collator.clone();
  24. }
  25. @Override
  26. public boolean incrementToken() throws IOException {
  27. if (input.incrementToken()) {
  28. byte[] collationKey = collator.getCollationKey(termAtt.toString()).toByteArray();
  29. int encodedLength = getBinaryStringEncodedLength(collationKey.length);
  30. termAtt.resizeBuffer(encodedLength);
  31. termAtt.setLength(encodedLength);
  32. encodeToBinaryString(collationKey, collationKey.length, termAtt.buffer());
  33. return true;
  34. } else {
  35. return false;
  36. }
  37. }
  38. // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
  39. private void encodeToBinaryString(byte[] inputArray, int inputLength, char[] outputArray) {
  40. if (inputLength &gt; 0) {
  41. int inputByteNum = 0;
  42. int caseNum = 0;
  43. int outputCharNum = 0;
  44. CollationKeyFilter.CodingCase codingCase;
  45. for (; inputByteNum + CODING_CASES[caseNum].numBytes &lt;= inputLength; ++outputCharNum) {
  46. codingCase = CODING_CASES[caseNum];
  47. if (codingCase.numBytes == 2) {
  48. outputArray[outputCharNum] = (char) (((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift)
  49. + (((inputArray[inputByteNum + 1] &amp; 0xFF) &gt;&gt;&gt; codingCase.finalShift) &amp; codingCase.finalMask) &amp; (short) 0x7FFF);
  50. } else {
  51. outputArray[outputCharNum] = (char) (((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift)
  52. + ((inputArray[inputByteNum + 1] &amp; 0xFF) &lt;&lt; codingCase.middleShift)
  53. + (((inputArray[inputByteNum + 2] &amp; 0xFF) &gt;&gt;&gt; codingCase.finalShift) &amp; codingCase.finalMask) &amp; (short) 0x7FFF);
  54. }
  55. inputByteNum += codingCase.advanceBytes;
  56. if (++caseNum == CODING_CASES.length) {
  57. caseNum = 0;
  58. }
  59. }
  60. codingCase = CODING_CASES[caseNum];
  61. if (inputByteNum + 1 &lt; inputLength) {
  62. outputArray[outputCharNum++] = (char) ((((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift)
  63. + ((inputArray[inputByteNum + 1] &amp; 0xFF) &lt;&lt; codingCase.middleShift)) &amp; (short) 0x7FFF);
  64. outputArray[outputCharNum] = (char) 1;
  65. } else if (inputByteNum &lt; inputLength) {
  66. outputArray[outputCharNum++] = (char) (((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift) &amp; (short) 0x7FFF);
  67. outputArray[outputCharNum] = caseNum == 0 ? (char) 1 : (char) 0;
  68. } else {
  69. outputArray[outputCharNum] = (char) 1;
  70. }
  71. }
  72. }
  73. // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
  74. private int getBinaryStringEncodedLength(int inputLength) {
  75. return (int) ((8L * inputLength + 14L) / 15L) + 1;
  76. }
  77. // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
  78. private static class CodingCase {
  79. int numBytes;
  80. int initialShift;
  81. int middleShift;
  82. int finalShift;
  83. int advanceBytes = 2;
  84. short middleMask;
  85. short finalMask;
  86. CodingCase(int initialShift, int middleShift, int finalShift) {
  87. this.numBytes = 3;
  88. this.initialShift = initialShift;
  89. this.middleShift = middleShift;
  90. this.finalShift = finalShift;
  91. this.finalMask = (short) ((short) 0xFF &gt;&gt;&gt; finalShift);
  92. this.middleMask = (short) ((short) 0xFF &lt;&lt; middleShift);
  93. }
  94. CodingCase(int initialShift, int finalShift) {
  95. this.numBytes = 2;
  96. this.initialShift = initialShift;
  97. this.finalShift = finalShift;
  98. this.finalMask = (short) ((short) 0xFF &gt;&gt;&gt; finalShift);
  99. if (finalShift != 0) {
  100. advanceBytes = 1;
  101. }
  102. }
  103. }
  104. @Override
  105. public boolean equals(Object o) {
  106. if (this == o) {
  107. return true;
  108. }
  109. if (o == null || getClass() != o.getClass()) {
  110. return false;
  111. }
  112. if (!super.equals(o)) {
  113. return false;
  114. }
  115. CollationKeyFilter that = (CollationKeyFilter) o;
  116. return Objects.equals(collator, that.collator) &amp;&amp;
  117. Objects.equals(termAtt, that.termAtt);
  118. }
  119. @Override
  120. public int hashCode() {
  121. return Objects.hash(super.hashCode(), collator, termAtt);
  122. }
  123. }

Entity mapping example:

  1. @Entity
  2. @NormalizerDef(name = &quot;textSortNormalizer&quot;,
  3. filters = {
  4. @TokenFilterDef(factory = LowerCaseFilterFactory.class),
  5. @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
  6. @Parameter(name = &quot;pattern&quot;, value = &quot;(&#39;-&amp;\\.,\\(\\))&quot;),
  7. @Parameter(name = &quot;replacement&quot;, value = &quot; &quot;),
  8. @Parameter(name = &quot;replace&quot;, value = &quot;all&quot;)
  9. }),
  10. @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
  11. @Parameter(name = &quot;pattern&quot;, value = &quot;([^0-9\\p{L} ])&quot;),
  12. @Parameter(name = &quot;replacement&quot;, value = &quot;&quot;),
  13. @Parameter(name = &quot;replace&quot;, value = &quot;all&quot;)
  14. }),
  15. @TokenFilterDef(factory = NorwegianCollationFactory.class)
  16. }
  17. )
  18. public class Entity {
  19. @Field(name = &quot;name_for_sort&quot;, normalizer = @Normalizer(definition = &quot;textSortNormalizer&quot;))
  20. @SortableField(forField = &quot;name_for_sort&quot;)
  21. private String name;
  22. }

huangapple
  • 本文由 发表于 2020年3月16日 17:38:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/60703488.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定