2020年3月16日 17:38:54go评论111阅读模式

英文:

Hibernate search sorting with collation

问题

我将Hibernate Search从版本4.3.0.Final升级到最新的稳定版本5.4.12.Final。除了挪威语单词的排序之外，一切都很好。在旧版本的Hibernate中，SortField的构造函数中有一个区域设置（locale）参数：

/** Creates a sort, possibly in reverse, by terms in the given field sorted
   * according to the given locale.
   * @param field  Name of field to sort by, cannot be <code>null</code>.
   * @param locale Locale of values in the field.
   */
  public SortField (String field, Locale locale, boolean reverse) {
    initFieldType(field, STRING);
    this.locale = locale;
    this.reverse = reverse;
  }

但是在新版本的Hibernate Search中，SortField没有区域设置参数。根据Hibernate参考文档（https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_analysis），为了对外语单词进行排序，我们应该使用带有规范化器的CollationKeyFilterFactory。但在这个版本的Hibernate Search中没有这样的类。Maven的pom.xml文件：

<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-search-orm</artifactId>
   <version>5.11.5.Final</version>
</dependency>

问题是：我应该在Hibernate Search中使用/创建什么来对挪威语单词进行排序？

现在我的排序顺序是：

> atest, btest, ctest, ztest, åtest, ætest, øtest

正确的顺序是：

> atest, btest, ctest, ztest, ætest, øtest, åtest

有一个CollationKeyAnalyzer类，但我不知道如何将其用于排序：

public final class CollationKeyAnalyzer extends Analyzer {
  private final CollationAttributeFactory factory;
  /**
   * Create a new CollationKeyAnalyzer, using the specified collator.
   *
   * @param collator CollationKey generator
   */
  public CollationKeyAnalyzer(Collator collator) {
    this.factory = new CollationAttributeFactory(collator);
  }
  @Override
  protected TokenStreamComponents createComponents(String fieldName) {
    KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
    return new TokenStreamComponents(tokenizer, tokenizer);
  }
}

非常相似但没有答案的问题：https://stackoverflow.com/questions/39264308/how-to-do-case-insensitive-sorting-of-norwegian-characters-%c3%86-%c3%98-and-%c3%85-using-h

英文:

I upgraded Hibernate search from version - 4.3.0.Final to the latest stable version - 5.4.12.Final. All is good except sorting norwegian words. In the old version of hibernate there was SortField with locale in the constructor:

/** Creates a sort, possibly in reverse, by terms in the given field sorted
   * according to the given locale.
   * @param field  Name of field to sort by, cannot be &lt;code&gt;null&lt;/code&gt;.
   * @param locale Locale of values in the field.
   */
  public SortField (String field, Locale locale, boolean reverse) {
    initFieldType(field, STRING);
    this.locale = locale;
    this.reverse = reverse;
  }

But in the new hibernate search SortField does not have locale. According to hibernate reference documentation (https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_analysis) for sort words words in foreign languages we should use CollationKeyFilterFactory with normalizer. But there is no such class in this version of hibernate search. Maven pom:

&lt;dependency&gt;
   &lt;groupId&gt;org.hibernate&lt;/groupId&gt;
   &lt;artifactId&gt;hibernate-search-orm&lt;/artifactId&gt;
   &lt;version&gt;5.11.5.Final&lt;/version&gt;
&lt;/dependency&gt;

The question: What should I use/create in the hibernate search for sort norwegian words?

Now I have such sort order:

> atest, btest, ctest, ztest, åtest, ætest, øtest

The correct order:

> atest, btest, ctest, ztest, ætest, øtest, åtest

There is CollationKeyAnalyzer class, but I do not know how to use this for sorting:

  public final class CollationKeyAnalyzer extends Analyzer {
  private final CollationAttributeFactory factory;
  
  /**
   * Create a new CollationKeyAnalyzer, using the specified collator.
   *
   * @param collator CollationKey generator
   */
  public CollationKeyAnalyzer(Collator collator) {
    this.factory = new CollationAttributeFactory(collator);
  }
  @Override
  protected TokenStreamComponents createComponents(String fieldName) {
    KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
    return new TokenStreamComponents(tokenizer, tokenizer);
  }
}

答案1

得分: 1

我不确定它对您有多大帮助，但 CollationKeyFilterFactory 已被弃用并且确实被移除。

在该类的 Javadoc 中写道：

>已弃用。
>请使用 CollationKeyAnalyzer 代替。

您可以在此处找到 Javadoc。

英文:

I'm not sure how much it helps you but the CollationKeyFilterFactory was deprecated and indeed removed.

In the class' Javadoc it says:

>Deprecated.
>use CollationKeyAnalyzer instead.

You can find the Javadoc here.

答案2

得分: 1

> 但是在这个版本的 hibernate search 中没有这样的类。

这部分文档看起来已经过时了，我会查看并更新它。

我找到了 CollationKeyAnalyzer，但是 javadoc 表明它已经过时了，应该使用 ICUCollationKeyAnalyzer 代替。

尝试将这个依赖项添加到你的 POM 文件中：

<dependency>
   <groupId>org.apache.lucene</groupId>
   <artifactId>lucene-analyzers-icu</artifactId>
   <version>5.5.5</version>
</dependency>

然后创建一个自定义的分析器类，重新实现 ICUCollationKeyAnalyzer 并使用硬编码的区域设置：

public class MyCollationKeyAnalyzer extends Analyzer {
    private final ICUCollationAttributeFactory factory;
    public MyCollationKeyAnalyzer(Version luceneVersion) {
        this.factory = new ICUCollationAttributeFactory( Collator.getInstance( Locale.getInstance( "nb_NO" ) ) );
    }
    @Override
    protected TokenStreamComponents createComponents(String fieldName) {
        KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
        return new TokenStreamComponents(tokenizer, tokenizer);
    }
}

然后创建你的字段：

@Entity
@Indexed
public class MyEntity {
    // ...
    @Field(name = "title_sort", index = Index.NO, normalizer = @Normalizer(impl = MyCollationKeyAnalyzer.class))
    @SortableField(forField = "title_sort")
    private String title;
   // ...
}

然后像这样在该字段上进行排序：

FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);
QueryBuilder qb = ...; // 通常的创建方式
Query luceneQuery = ...; // 通常的创建方式
FullTextQuery ftQuery = ftEm.createFullTextQuery(luceneQuery, MyEntity.class);
ftQuery.setSort(qb.sort().byField("title_sort").createSort());
ftQuery.setMaxResults(20);
List<MyEntity> hits = ftQuery.getResultList();

我没有尝试过这个，所以如果对你有用的话，请告诉我们。

英文:

> But there is no such class in this version of hibernate search.

This part of the documentation looks obsolete, I'll look into updating it.

I found CollationKeyAnalyzer, but the javadoc states that it's obsolete and that ICUCollationKeyAnalyzer should be used instead.

Try adding this dependency to your POM:

&lt;dependency&gt;
   &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;
   &lt;artifactId&gt;lucene-analyzers-icu&lt;/artifactId&gt;
   &lt;version&gt;5.5.5&lt;/version&gt;
&lt;/dependency&gt;

Then create your own analyzer class that re-implements ICUCollationKeyAnalyzer with a hard-coded locale:

public class MyCollationKeyAnalyzer extends Analyzer {
    private final ICUCollationAttributeFactory factory;
    public MyCollationKeyAnalyzer(Version luceneVersion) {
        this.factory = new ICUCollationAttributeFactory( Collactor.getInstance( Locale.getInstance( &quot;nb_NO&quot; ) ) );
    }
    @Override
    protected TokenStreamComponents createComponents(String fieldName) {
        KeywordTokenizer tokenizer = new KeywordTokenizer(factory, KeywordTokenizer.DEFAULT_BUFFER_SIZE);
        return new TokenStreamComponents(tokenizer, tokenizer);
    }
}

Then create your field:

@Entity
@Indexed
public class MyEntity {
    // ...
    @Field(name = &quot;title_sort&quot;, index = Index.NO, normalizer = @Normalizer(impl = MyCollationKeyAnalyzer.class))
    @SortableField(forField = &quot;title_sort&quot;)
    private String title;
   // ...
}

Then sort on that field like this:

FullTextEntityManager ftEm = Search.getFullTextEntityManager( entityManager );
QueryBuilder qb = ...; // The usual
Query luceneQuery = ...; // The usual
FullTextQuery ftQuery = ftEm.createFullTextQuery( luceneQuery, MyEntity.class );
ftQuery.setSort( qb.sort().byField( &quot;title_sort&quot; ).createSort() );
ftQuery.setMaxResults( 20 );
List&lt;MyEntity&gt; hits = ftQuery.getResultList();

I didn't try this though, so let us know if it worked for you.

答案3

得分: 1

为了解决排序问题，我创建了自己的NorwegianCollationFactory。尽管这不是完美的解决方案，因为我从旧版本的Hibernate Search（IndexableBinaryStringTools.class）中复制了代码，但它能正常工作。

NorwegianCollationFactory类：

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.util.TokenFilterFactory;
import java.text.Collator;
import java.util.Locale;
import java.util.Map;
public final class NorwegianCollationFactory extends TokenFilterFactory {
    public NorwegianCollationFactory(Map<String, String> args) {
        super(args);
    }
    @Override
    public TokenStream create(TokenStream input) {
        Collator norwegianCollator = Collator.getInstance(new Locale("no", "NO"));
        return new CollationKeyFilter(input, norwegianCollator);
    }
}

CollationKeyFilter类：

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.IOException;
import java.text.Collator;
import java.util.Objects;
public final class CollationKeyFilter extends TokenFilter {
    // 这段代码是从旧版本的Hibernate Search 4.3.0.Final的IndexableBinaryStringTools.class中复制的
    // ...
    //（以下代码省略，因为长度较长）
}

Entity映射示例：

@Entity
@NormalizerDef(name = "textSortNormalizer",
        filters = {
                // ...（以下代码省略，因为长度较长）
                @TokenFilterDef(factory = NorwegianCollationFactory.class)
        }
)
public class Entity {
    @Field(name = "name_for_sort", normalizer = @Normalizer(definition = "textSortNormalizer"))
    @SortableField(forField = "name_for_sort")
    private String name;
}

（以上代码已省略，因为长度较长）

英文:

In order to fix sorting I created my own NorwegianCollationFactory. It is not perfect solution as I copied code from old version of Hibernate Search (IndexableBinaryStringTools.class), but it is working fine.<br>
NorwegianCollationFactory class:

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.util.TokenFilterFactory;
import java.text.Collator;
import java.util.Locale;
import java.util.Map;
public final class NorwegianCollationFactory extends TokenFilterFactory {
    public NorwegianCollationFactory(Map&lt;String, String&gt; args) {
        super(args);
    }
    @Override
    public TokenStream create(TokenStream input) {
        Collator norwegianCollator = Collator.getInstance(new Locale(&quot;no&quot;, &quot;NO&quot;));
        return new CollationKeyFilter(input, norwegianCollator);
    }
}

CollationKeyFilter class:

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import java.io.IOException;
import java.text.Collator;
import java.util.Objects;
public final class CollationKeyFilter extends TokenFilter {
    // This code is copied from IndexableBinaryStringTools.class from the old version of hibernate search  4.3.0.Final
    private static final CollationKeyFilter.CodingCase[] CODING_CASES = {
            new CollationKeyFilter.CodingCase(7, 1),
            new CollationKeyFilter.CodingCase(14, 6, 2),
            new CollationKeyFilter.CodingCase(13, 5, 3),
            new CollationKeyFilter.CodingCase(12, 4, 4),
            new CollationKeyFilter.CodingCase(11, 3, 5),
            new CollationKeyFilter.CodingCase(10, 2, 6),
            new CollationKeyFilter.CodingCase(9, 1, 7),
            new CollationKeyFilter.CodingCase(8, 0)
    };
    private final Collator collator;
    private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
    public CollationKeyFilter(TokenStream input, Collator collator) {
        super(input);
        this.collator = (Collator) collator.clone();
    }
    @Override
    public boolean incrementToken() throws IOException {
        if (input.incrementToken()) {
            byte[] collationKey = collator.getCollationKey(termAtt.toString()).toByteArray();
            int encodedLength = getBinaryStringEncodedLength(collationKey.length);
            termAtt.resizeBuffer(encodedLength);
            termAtt.setLength(encodedLength);
            encodeToBinaryString(collationKey, collationKey.length, termAtt.buffer());
            return true;
        } else {
            return false;
        }
    }
    // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search  4.3.0.Final
    private void encodeToBinaryString(byte[] inputArray, int inputLength, char[] outputArray) {
        if (inputLength &gt; 0) {
            int inputByteNum = 0;
            int caseNum = 0;
            int outputCharNum = 0;
            CollationKeyFilter.CodingCase codingCase;
            for (; inputByteNum + CODING_CASES[caseNum].numBytes &lt;= inputLength; ++outputCharNum) {
                codingCase = CODING_CASES[caseNum];
                if (codingCase.numBytes == 2) {
                    outputArray[outputCharNum] = (char) (((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift)
                            + (((inputArray[inputByteNum + 1] &amp; 0xFF) &gt;&gt;&gt; codingCase.finalShift) &amp; codingCase.finalMask) &amp; (short) 0x7FFF);
                } else {
                    outputArray[outputCharNum] = (char) (((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift)
                            + ((inputArray[inputByteNum + 1] &amp; 0xFF) &lt;&lt; codingCase.middleShift)
                            + (((inputArray[inputByteNum + 2] &amp; 0xFF) &gt;&gt;&gt; codingCase.finalShift) &amp; codingCase.finalMask) &amp; (short) 0x7FFF);
                }
                inputByteNum += codingCase.advanceBytes;
                if (++caseNum == CODING_CASES.length) {
                    caseNum = 0;
                }
            }
            codingCase = CODING_CASES[caseNum];
            if (inputByteNum + 1 &lt; inputLength) {
                outputArray[outputCharNum++] = (char) ((((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift)
                        + ((inputArray[inputByteNum + 1] &amp; 0xFF) &lt;&lt; codingCase.middleShift)) &amp; (short) 0x7FFF);
                outputArray[outputCharNum] = (char) 1;
            } else if (inputByteNum &lt; inputLength) {
                outputArray[outputCharNum++] = (char) (((inputArray[inputByteNum] &amp; 0xFF) &lt;&lt; codingCase.initialShift) &amp; (short) 0x7FFF);
                outputArray[outputCharNum] = caseNum == 0 ? (char) 1 : (char) 0;
            } else {
                outputArray[outputCharNum] = (char) 1;
            }
        }
    }
    // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
    private int getBinaryStringEncodedLength(int inputLength) {
        return (int) ((8L * inputLength + 14L) / 15L) + 1;
    }
    // This code is copied from IndexableBinaryStringTools class from the old version of hibernate search 4.3.0.Final
    private static class CodingCase {
        int numBytes;
        int initialShift;
        int middleShift;
        int finalShift;
        int advanceBytes = 2;
        short middleMask;
        short finalMask;
        CodingCase(int initialShift, int middleShift, int finalShift) {
            this.numBytes = 3;
            this.initialShift = initialShift;
            this.middleShift = middleShift;
            this.finalShift = finalShift;
            this.finalMask = (short) ((short) 0xFF &gt;&gt;&gt; finalShift);
            this.middleMask = (short) ((short) 0xFF &lt;&lt; middleShift);
        }
        CodingCase(int initialShift, int finalShift) {
            this.numBytes = 2;
            this.initialShift = initialShift;
            this.finalShift = finalShift;
            this.finalMask = (short) ((short) 0xFF &gt;&gt;&gt; finalShift);
            if (finalShift != 0) {
                advanceBytes = 1;
            }
        }
    }
    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        if (!super.equals(o)) {
            return false;
        }
        CollationKeyFilter that = (CollationKeyFilter) o;
        return Objects.equals(collator, that.collator) &amp;&amp;
                Objects.equals(termAtt, that.termAtt);
    }
    @Override
    public int hashCode() {
        return Objects.hash(super.hashCode(), collator, termAtt);
    }
}

Entity mapping example:

@Entity
@NormalizerDef(name = &quot;textSortNormalizer&quot;,
        filters = {
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                        @Parameter(name = &quot;pattern&quot;, value = &quot;(&#39;-&amp;\\.,\\(\\))&quot;),
                        @Parameter(name = &quot;replacement&quot;, value = &quot; &quot;),
                        @Parameter(name = &quot;replace&quot;, value = &quot;all&quot;)
                }),
                @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                        @Parameter(name = &quot;pattern&quot;, value = &quot;([^0-9\\p{L} ])&quot;),
                        @Parameter(name = &quot;replacement&quot;, value = &quot;&quot;),
                        @Parameter(name = &quot;replace&quot;, value = &quot;all&quot;)
                }),
                @TokenFilterDef(factory = NorwegianCollationFactory.class)
        }
)
public class Entity {
    @Field(name = &quot;name_for_sort&quot;, normalizer = @Normalizer(definition = &quot;textSortNormalizer&quot;))
    @SortableField(forField = &quot;name_for_sort&quot;)
    private String name;
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Hibernate搜索与排序和排序

问题

答案1

答案2

答案3

有没有一种方法可以在继承中使用记录（Records）？

何时应在Spring Boot应用程序中重写configure(AuthenticationManagerBuilder auth)方法？

编辑文本，在文本更改时移除单词。

replace `getIntVolatile(Object var1, long var2)` with `getInt(Object var1, long var2)` in the implementation of incrementAndGet() in java

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论