如何更新标准分词器的标记。

huangapple

117266
文章

0
评论

2020年8月5日 21:08:22go评论92阅读模式

英文:

How to update the tokens of standard tokenizer

问题

我在我的Elasticsearch插件中使用标准分词器。我需要迭代标准分词器的每个标记，并将其更新为一些加密文本到Lucene索引。是否有任何方法可以更新标准分词器的标记？有人可以帮忙吗？

英文:

I am using the standard tokenizer in my elasticsearch plugin. I need to iterate each token of standard tokenizer and update with some encrypted text to the lucene index. Is there any way to update the tokens of standard tokenizer? Can anyone help?

答案1

得分: 1

这是一个有趣的用例，但在我看来，分词器（tokenizer）并不是应该执行此操作的正确地方，基本上，Elasticsearch分析过程由以下三个阶段组成。

字符过滤器（char filter）
分词器（tokenizer）
词元过滤器（token filter）

如果您想在将文本发送到分词器之前更改某些字符，请在字符过滤器中执行此操作，或在词元过滤器中更改词元。正如您可以在这些阶段中看到的那样，您可以在词元分析阶段进行更多的转换。

英文:

Its an interesting use case, but tokenizer IMHO is not the correct place where it should be done, basically the elasticsearch analysis process consists of below three-phase.

char filter
tokenizer
token filter

if you want to change some chars, before sending it to tokenizer do it in char filter or change the tokens in the token filter, as you can see in these phases you can do more transformation than in tokenizer phase.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

本文由 huangapple 发表于 2020年8月5日 21:08:22
转载请务必保留本文链接：https://go.coder-hub.com/63265899.html

elasticsearch
elasticsearch-analyzers
elasticsearch-plugin
java

如何更新标准分词器的标记。

问题

答案1

关于Java中的生产者消费者模型的问题

Comparing generic types (extends vs implements and within the context of LinkedList using nodes private class)

开放API Spring Boot – 错误 ‘swaggerWelcome’ 引发异常

Spring Security登录仍然出现，即使我已经使用了permit all。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。