2023年6月8日 07:35:45go评论56阅读模式

英文:

Lucene | How to find prefix matches at beginning of field?

问题

我想要匹配字段开头附近的前缀。我有这个，但它不匹配前缀；它只有在搜索词完全匹配时才匹配整个单词。似乎没有办法将 SpanTermQuery 和 PrefixQuery 结合起来。

例如：

搜索词："Comp"
想要找到："Computer science class" 和 "Comp Sci"
只找到："Comp Sci"
不想找到："Apple's latest computer"

RegexpQuery 能理解位置吗？

英文:

I want to match prefixes near the start of a field. I have this, but it's not matching the prefix; it only matches the whole word if the search term matches it. It seems like there's no way to combine SpanTermQuery and PrefixQuery.

        var nameTerm = new Term(&quot;name&quot;, searchTerm);

        var prefixName = new PrefixQuery(nameTerm);

        var prefixAtStart = new BooleanQuery
        {
            { prefixName, Occur.MUST },
            {  new SpanFirstQuery(new SpanTermQuery(nameTerm), 0), Occur.MUST }
        };

For example:

Search term: "Comp"
Want to find: "Computer science class" and "Comp Sci"
Only finding: "Comp Sci"
Don't want to find: "Apple's latest computer"

Can the RegexpQuery be made to understand positions?

答案1

得分: 1

以下是翻译好的部分：

当您只想匹配前缀时，您可以通过为您的字段使用以下字段类型来实现。

<analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>

在这种情况下，查询将如下所示：

field:comp*

现在您有第二个需要使用NGramFilter的字段，所以您可以为您的字段使用以下字段类型。

<field name="text_prefix" type="text_prefix" indexed="true" stored="false"/>

<fieldType name="text_prefix" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" side="front"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
</fieldType>

英文:

When you only want to match prefixes, you can do it by having below field type for your field.

&lt;analyzer&gt;
  &lt;tokenizer class=&quot;solr.KeywordTokenizerFactory&quot;/&gt;
  &lt;filter class=&quot;solr.LowerCaseFilterFactory&quot;/&gt;
&lt;/analyzer&gt;

then in this case the query would be like :

field:comp*

Now you have a second for which you need NGramFilter, so you can use the below field type for your field.

&lt;field name=&quot;text_prefix&quot; type=&quot;text_prefix&quot; indexed=&quot;true&quot; stored=&quot;false&quot;/&gt;

&lt;fieldType name=&quot;text_prefix&quot; class=&quot;solr.TextField&quot; positionIncrementGap=&quot;100&quot;&gt;
        &lt;analyzer type=&quot;index&quot;&gt;
            &lt;tokenizer class=&quot;solr.LowerCaseTokenizerFactory&quot;/&gt;
            &lt;filter class=&quot;solr.EdgeNGramFilterFactory&quot; minGramSize=&quot;3&quot; maxGramSize=&quot;15&quot; side=&quot;front&quot;/&gt;
        &lt;/analyzer&gt;
        &lt;analyzer type=&quot;query&quot;&gt;
            &lt;tokenizer class=&quot;solr.LowerCaseTokenizerFactory&quot;/&gt;
        &lt;/analyzer&gt;
    &lt;/fieldType&gt;

答案2

得分: 0

以下是Lucene.Net设置EdgeNGramFilter的方式，翻译如下：

public class CustomAnalyzer : Analyzer
{
    protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
    {
        Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);

        TokenFilter filter = new EdgeNGramTokenFilter(LuceneVersion.LUCENE_48, tokenizer, 3, 10);

        return new TokenStreamComponents(tokenizer, filter);
    }
}

英文:

Translating Abhijit's response, here is the Lucene.Net way to setup the EdgeNGramFilter:

public class CustomAnalyzer : Analyzer
{
    protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
    {
        Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);

        TokenFilter filter = new EdgeNGramTokenFilter(LuceneVersion.LUCENE_48, tokenizer, 3, 10);

        return new TokenStreamComponents(tokenizer, filter);
    }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Lucene | 如何在字段开头找到前缀匹配？

问题

答案1

答案2

构建事件未定义

How can I create a index in Elasticsearch with `go-elasticsearch` library?

Override org.apache.solr.handler.dataimport.JdbcDataSource for jdbc data direct sqlserver driver

在尝试使用Node.js从MongoDB导入数据到Elasticsearch时遇到错误。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论