2023年1月9日 14:08:23go评论59阅读模式

英文:

ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

问题

我对ElasticSearch还很陌生，如果我忽略了一些显而易见/基本的东西，请原谅。

现在我在工作中使用ElasticSearch，并想看看由我的前任设置的复杂的分析器/分词器/过滤器的设置是如何将文本拆分为标记的。

我进行了一些研究，找到了如何执行此操作的方法：

GET /_analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

然而，正如我所说，分析器/分词器/过滤器的设置非常复杂，每次测试设置时都写下详细信息会让我非常慢。

所以我想要分析一个文本，已经应用到一个索引上的分析器/分词器/过滤器设置。有没有办法可以做到这一点？

如果有人能为我提供一些帮助，我将不胜感激。

英文:

I'm quite new to ElasticSearch, so if I overlook something obvious/basic, please forgive me.

Now I'm using ElasticSearch at work, and want to see how the complex settings of analyzers/tokenizers/filters--which are set by my predecessors--split texts into tokens.

I did some research and found the way to do it:

GET /_analyze
{
  &quot;tokenizer&quot; : &quot;whitespace&quot;,
  &quot;filter&quot; : [&quot;lowercase&quot;, {&quot;type&quot;: &quot;stop&quot;, &quot;stopwords&quot;: [&quot;a&quot;, &quot;is&quot;, &quot;this&quot;]}],
  &quot;text&quot; : &quot;this is a test&quot;
}

However, as I said, the settings of analyzers/tokenizers/filters is so complicated that writing the details every time I test the settings would horribly slow me down.

So I want to analyze a text with analyzers/tokenizers/filters settings already applied to an index. Is there way to do that?

I would appreciate it if anyone would shed some lights on it.

答案1

得分: 1

您不必每次都提供完整的分析器定义来使用分析API，您可以简单地在索引上使用_analyze API，并像以下方式使用它：

GET <your-index-name>/_analyze
{
  "analyzer": "standard",
  "text": "Quick Brown Foxes!"
}

因此，您将不再在集群级别使用分析API，而是在索引级别使用它，在那里分析器定义已经存在，所以您只需要提供analyzer名称，而不是其定义，比如filter等，以获取基于该分析器的标记。

请参考Elasticsearch官方文档，了解在特定索引或特定字段上使用它的示例。

英文:

You don't have to supply the complete analyzer definition every time to analyze API, you can simply use the _analyze API on index and use it like following

GET &lt;your-index-name&gt;/_analyze
{
  &quot;analyzer&quot; : &quot;standard&quot;,
  &quot;text&quot; : &quot;Quick Brown Foxes!&quot;
}

So instead of using the analyze API at a cluster level, you will be using it on index level, where analyzer definition is already present, so you just need to provide the analyzer name not its definition like filter etc to get the tokens based on the analyzer.

Refer Elasticsearch official documentation on using it on specific index or on a specific field with examples.

Hope this helps.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

问题

答案1

Elasticsearch网络发现集成

更好的Elasticsearch客户端，用于从JAVA Spring Boot连接到AWS Elasticsearch。

Lucene | 如何在字段开头找到前缀匹配？

从任何起始偏移量开始对每个单词进行标记化。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论