ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

huangapple go评论59阅读模式
英文:

ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

问题

我对ElasticSearch还很陌生,如果我忽略了一些显而易见/基本的东西,请原谅。

现在我在工作中使用ElasticSearch,并想看看由我的前任设置的复杂的分析器/分词器/过滤器的设置是如何将文本拆分为标记的。

我进行了一些研究,找到了如何执行此操作的方法:

GET /_analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

然而,正如我所说,分析器/分词器/过滤器的设置非常复杂,每次测试设置时都写下详细信息会让我非常慢。

所以我想要分析一个文本,已经应用到一个索引上的分析器/分词器/过滤器设置。有没有办法可以做到这一点?

如果有人能为我提供一些帮助,我将不胜感激。

英文:

I'm quite new to ElasticSearch, so if I overlook something obvious/basic, please forgive me.

Now I'm using ElasticSearch at work, and want to see how the complex settings of analyzers/tokenizers/filters--which are set by my predecessors--split texts into tokens.

I did some research and found the way to do it:

GET /_analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

However, as I said, the settings of analyzers/tokenizers/filters is so complicated that writing the details every time I test the settings would horribly slow me down.

So I want to analyze a text with analyzers/tokenizers/filters settings already applied to an index. Is there way to do that?

I would appreciate it if anyone would shed some lights on it.

答案1

得分: 1

您不必每次都提供完整的分析器定义来使用分析API,您可以简单地在索引上使用_analyze API,并像以下方式使用它:

GET <your-index-name>/_analyze
{
  "analyzer": "standard",
  "text": "Quick Brown Foxes!"
}

因此,您将不再在集群级别使用分析API,而是在索引级别使用它,在那里分析器定义已经存在,所以您只需要提供analyzer名称,而不是其定义,比如filter等,以获取基于该分析器的标记。

请参考Elasticsearch官方文档,了解在特定索引特定字段上使用它的示例。

英文:

You don't have to supply the complete analyzer definition every time to analyze API, you can simply use the _analyze API on index and use it like following

GET &lt;your-index-name&gt;/_analyze
{
  &quot;analyzer&quot; : &quot;standard&quot;,
  &quot;text&quot; : &quot;Quick Brown Foxes!&quot;
}

So instead of using the analyze API at a cluster level, you will be using it on index level, where analyzer definition is already present, so you just need to provide the analyzer name not its definition like filter etc to get the tokens based on the analyzer.

Refer Elasticsearch official documentation on using it on specific index or on a specific field with examples.

Hope this helps.

huangapple
  • 本文由 发表于 2023年1月9日 14:08:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75053693.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定