问题

我们正在尝试使用BreakIterator将日语句子分解为单词，按照这个问题中的代码进行操作。这段代码仅对问题中提供的文本有效，在我们尝试提供不同的文本，例如"速い茶色のキツネは怠惰な犬を飛び越えます"时，无法正确分解单词。

可能的问题是什么？

英文:

We are trying to break Japanese sentences into words using BreakIterator by following the code in this question. This code is working fine only for the text which is given in the question and when we try giving a different text e.g "速い茶色のキツネは怠惰な犬を飛び越えます" it is unable to break the words.

What could be the issue?

答案1

得分: 1

BreakIterator.getSentenceInstance(Locale.JAPAN) 在这个问题中用于将日语脚本分成句子，而不是单词。通常，日语语言写作时没有标点符号来分隔单词。

要将句子分成单词，您需要使用形态分析器。例如，您可以使用TinySegmenter的Java移植版。

import java.util.List;
import jp.toastkid.libs.tinysegmenter.TinySegmenter;

public class Test {
  public static void main(String[] args) {
      TinySegmenter ts = TinySegmenter.getInstance();
      List&lt;String&gt; list = ts.segment(&quot;速い茶色のキツネは怠惰な犬を飛び越えます。&quot;);
      System.out.println(String.join(&quot; | &quot;, list));
      // 您将获得"速い | 茶色 | の | キツネ | は | 怠惰 | な | 犬 | を | 飛び越え | ます"
  }
}

英文:

BreakIterator.getSentenceInstance(Locale.JAPAN) in this question breaks a Japanese script into sentences, rather than words. Usually, the Japanese language is written without punctuation to separate words.

You have to use a morphological analyzer to break a sentence into words. For example, you can use a Java port of TinySegmenter.

import java.util.List;
import jp.toastkid.libs.tinysegmenter.TinySegmenter;

public class Test {
  public static void main(String[] args) {
      TinySegmenter ts = TinySegmenter.getInstance();
      List&lt;String&gt; list = ts.segment(&quot;速い茶色のキツネは怠惰な犬を飛び越えます。&quot;);
      System.out.println(String.join(&quot; | &quot;, list));
      // You will get &quot;速い | 茶色 | の | キツネ | は | 怠惰 | な | 犬 | を | 飛び越え | ます&quot;
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用BreakIterator在Java中将日文文本拆分为单词。

问题

答案1

Jsoup – 如何检测严格相邻的元素 – 检查元素是否已被移除

Spring Boot 遇到意外字符 % 代码 37

请求时间包括Java中的STW（垃圾收集）时间（Tomcat）。

Cannot cast `java.util.Collections$EmptySet` to `java.util.HashSet`.

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论