英文:
How to update the tokens of standard tokenizer
问题
我在我的Elasticsearch插件中使用标准分词器。我需要迭代标准分词器的每个标记,并将其更新为一些加密文本到Lucene索引。是否有任何方法可以更新标准分词器的标记?有人可以帮忙吗?
英文:
I am using the standard tokenizer in my elasticsearch plugin. I need to iterate each token of standard tokenizer and update with some encrypted text to the lucene index. Is there any way to update the tokens of standard tokenizer? Can anyone help?
答案1
得分: 1
这是一个有趣的用例,但在我看来,分词器(tokenizer)并不是应该执行此操作的正确地方,基本上,Elasticsearch分析过程由以下三个阶段组成。
- 字符过滤器(char filter)
- 分词器(tokenizer)
- 词元过滤器(token filter)
如果您想在将文本发送到分词器之前更改某些字符,请在字符过滤器中执行此操作,或在词元过滤器中更改词元。正如您可以在这些阶段中看到的那样,您可以在词元分析阶段进行更多的转换。
英文:
Its an interesting use case, but tokenizer IMHO is not the correct place where it should be done, basically the elasticsearch analysis process consists of below three-phase.
- char filter
- tokenizer
- token filter
if you want to change some chars, before sending it to tokenizer do it in char filter or change the tokens in the token filter, as you can see in these phases you can do more transformation than in tokenizer phase.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论