2023年6月1日 13:25:57go评论48阅读模式

英文:

Is there a Lucene Analyzer that will ignore the difference between greek symbol and phonetic english name?

问题

Ideally, I'd like something that otherwise acts like a StandardAnalyzer but treats all Greek Symbols as equivalent with their English phonetic spelling ("beta" == "β", "omega" == "ω"). I looked at the ICU analyzer but it doesn't go quite that far. If it doesn't exist, might you have a suggestion about the most efficient way to design such an analyzer?

英文:

Ideally, I'd like something that otherwise acts like a StandardAnalyzers but treats all Greek Symbols as equivalent with their English phonetic spelling ("beta" == "β", "omega" == "ω"). I looked at the ICU analyzer but it doesn't go quite that far. If it doesn't exist, might you have a suggestion about the most efficient way to design such an analyzer?

答案1

得分: 1

Here is the translated code portion:

进行了关于@Val建议的研究后，我整理了这个代码。我不确定它是否完全正确，但将其保存在这里，以防有人将其视为有用的起点。

private static Analyzer GetGreekSymbolAgnosticAnalyzer()
{
    NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
    builder.Add("α", "alpha");
    builder.Add("β", "beta");
    builder.Add("ω", "omega");

    NormalizeCharMap norm = builder.Build();
    Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
    {
        Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
        return new TokenStreamComponents(tokenizer, new StandardFilter(LuceneVersion.LUCENE_48, tokenizer));
    }, initReader: (fieldName, reader) => new MappingCharFilter(norm, reader));

    return analyzer;
}

英文:

After doing research on @Val suggestion. I put this together. I'm not sure if it's quite right but saving here in case anyone finds as a useful starting point.

    private static Analyzer GetGreekSymbolAgnosticAnalyzer()
    {
        NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
        builder.Add(&quot;α&quot;, &quot;alpha&quot;);
        builder.Add(&quot;β&quot;, &quot;beta&quot;);
        builder.Add(&quot;ω&quot;, &quot;omega&quot;);

        NormalizeCharMap norm = builder.Build();
        Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =&gt;
        {
            Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
            return new TokenStreamComponents(tokenizer, new StandardFilter(LuceneVersion.LUCENE_48, tokenizer));
        }, initReader: (fieldName, reader) =&gt; new MappingCharFilter(norm, reader));

        return analyzer;
    }

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有一种Lucene分析器可以忽略希腊符号和英语音标名称之间的差异？

问题

答案1

Hibernate搜索与排序和排序

搜索字段在List @IndexedEmbedded中

在Lucene中对整数字段的范围分面

使用 ASCIIFoldingFilter 中的静态 “foldToAscii” 方法。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论