使用Python库来自定义Elasticsearch中的过滤器分析器。

huangapple go评论52阅读模式
英文:

use python library for custom filter analyzer in elasticsearch

问题

我想为波斯语文本创建一个“index”,并为其创建词干处理器。以下是如何将“PersianStemmer” Python库实现到Elasticsearch的“analyzer”中的示例:

PUT my_index
{
    "settings": {
        "analysis": {
            "filter": {
                "persian_stemmer": {
                    "type": "stemmer",
                    "name": "persian"
                }
            },
            "analyzer": {
                "persian_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "persian_stemmer"]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "description": {
                "type": "text",
                "analyzer": "persian_analyzer"
            }
        }
    }
}

此示例将创建一个名为“persian_analyzer”的自定义分析器,该分析器使用标准分词器,然后应用小写转换和波斯文词干处理器。描述字段使用此分析器进行分析。请确保您已经安装了“PersianStemmer” Python库,并且已将其集成到您的Elasticsearch环境中。

英文:

I want to create an index for persian-language text and I want to create stemmer for that, this is english-stemming for description field

PUT my_index
{
    "mappings": {
      "properties": {
        "description": {
          "type": "text",
          "analyzer": "english"
        }
      }
    }, 
    "settings": {
      "analysis":{
        "filter": {
          "english_stemmer": {
            "type":       "stemmer",
            "language":   "english"
          }
        }
      }
    }
}

Now I want to know how can implement the PersianStemmer python library to elasticsearch analyzer?

答案1

得分: 1

你需要为此创建自定义分析器:

PUT my_index
{
   "settings": {
      "analysis": {
         "filter": {
            "persian_stemmer": {
               "type": "stemmer",
               "language": "persian"
            }
         },
         "analyzer": {
            "persian_analyzer": {
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "persian_stemmer"
               ]
            }
         }
      }
   },
   "mappings": {
      "properties": {
         "description": {
            "type": "text",
            "analyzer": "persian_analyzer"
         }
      }
   }
}
英文:

You need to create custom analyzer for that:

    PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "persian_stemmer": {
          "type": "stemmer",
          "language": "persian"
        }
      },
      "analyzer": {
        "persian_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "persian_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "persian_analyzer"
      }
    }
  }
}

huangapple
  • 本文由 发表于 2023年2月26日 20:41:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75572044.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定