英文:
Lucene | How to find prefix matches at beginning of field?
问题
我想要匹配字段开头附近的前缀。我有这个,但它不匹配前缀;它只有在搜索词完全匹配时才匹配整个单词。似乎没有办法将 SpanTermQuery 和 PrefixQuery 结合起来。
例如:
- 搜索词:
"Comp"
- 想要找到:
"Computer science class"
和"Comp Sci"
- 只找到:
"Comp Sci"
- 不想找到:
"Apple's latest computer"
RegexpQuery 能理解位置吗?
英文:
I want to match prefixes near the start of a field. I have this, but it's not matching the prefix; it only matches the whole word if the search term matches it. It seems like there's no way to combine SpanTermQuery and PrefixQuery.
var nameTerm = new Term("name", searchTerm);
var prefixName = new PrefixQuery(nameTerm);
var prefixAtStart = new BooleanQuery
{
{ prefixName, Occur.MUST },
{ new SpanFirstQuery(new SpanTermQuery(nameTerm), 0), Occur.MUST }
};
For example:
- Search term:
"Comp"
- Want to find:
"Computer science class"
and"Comp Sci"
- Only finding:
"Comp Sci"
- Don't want to find:
"Apple's latest computer"
Can the RegexpQuery be made to understand positions?
答案1
得分: 1
以下是翻译好的部分:
当您只想匹配前缀时,您可以通过为您的字段使用以下字段类型来实现。
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
在这种情况下,查询将如下所示:
field:comp*
现在您有第二个需要使用NGramFilter的字段,所以您可以为您的字段使用以下字段类型。
<field name="text_prefix" type="text_prefix" indexed="true" stored="false"/>
<fieldType name="text_prefix" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
英文:
When you only want to match prefixes, you can do it by having below field type for your field.
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
then in this case the query would be like :
field:comp*
Now you have a second for which you need NGramFilter, so you can use the below field type for your field.
<field name="text_prefix" type="text_prefix" indexed="true" stored="false"/>
<fieldType name="text_prefix" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
答案2
得分: 0
以下是Lucene.Net设置EdgeNGramFilter的方式,翻译如下:
public class CustomAnalyzer : Analyzer
{
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
{
Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
TokenFilter filter = new EdgeNGramTokenFilter(LuceneVersion.LUCENE_48, tokenizer, 3, 10);
return new TokenStreamComponents(tokenizer, filter);
}
}
英文:
Translating Abhijit's response, here is the Lucene.Net way to setup the EdgeNGramFilter:
public class CustomAnalyzer : Analyzer
{
protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
{
Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
TokenFilter filter = new EdgeNGramTokenFilter(LuceneVersion.LUCENE_48, tokenizer, 3, 10);
return new TokenStreamComponents(tokenizer, filter);
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论