英文:
Is there a Lucene Analyzer that will ignore the difference between greek symbol and phonetic english name?
问题
Ideally, I'd like something that otherwise acts like a StandardAnalyzer but treats all Greek Symbols as equivalent with their English phonetic spelling ("beta" == "β", "omega" == "ω"). I looked at the ICU analyzer but it doesn't go quite that far. If it doesn't exist, might you have a suggestion about the most efficient way to design such an analyzer?
英文:
Ideally, I'd like something that otherwise acts like a StandardAnalyzers but treats all Greek Symbols as equivalent with their English phonetic spelling ("beta" == "β", "omega" == "ω"). I looked at the ICU analyzer but it doesn't go quite that far. If it doesn't exist, might you have a suggestion about the most efficient way to design such an analyzer?
答案1
得分: 1
Here is the translated code portion:
进行了关于@Val建议的研究后,我整理了这个代码。我不确定它是否完全正确,但将其保存在这里,以防有人将其视为有用的起点。
private static Analyzer GetGreekSymbolAgnosticAnalyzer()
{
NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
builder.Add("α", "alpha");
builder.Add("β", "beta");
builder.Add("ω", "omega");
NormalizeCharMap norm = builder.Build();
Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
return new TokenStreamComponents(tokenizer, new StandardFilter(LuceneVersion.LUCENE_48, tokenizer));
}, initReader: (fieldName, reader) => new MappingCharFilter(norm, reader));
return analyzer;
}
英文:
After doing research on @Val suggestion. I put this together. I'm not sure if it's quite right but saving here in case anyone finds as a useful starting point.
private static Analyzer GetGreekSymbolAgnosticAnalyzer()
{
NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
builder.Add("α", "alpha");
builder.Add("β", "beta");
builder.Add("ω", "omega");
NormalizeCharMap norm = builder.Build();
Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
return new TokenStreamComponents(tokenizer, new StandardFilter(LuceneVersion.LUCENE_48, tokenizer));
}, initReader: (fieldName, reader) => new MappingCharFilter(norm, reader));
return analyzer;
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论