从网站读取时Java返回错误的数值

huangapple go评论40阅读模式
英文:

Java wrong value when reading from website

问题

我正在尝试制作一个程序,从一个网站获取一个单词,该网站始终提供一个随机的(德语)单词,并计算一个字符在该单词中出现的频率。当我尝试使用来自列表的流进行测试时,它运行得很正常。但是,如果我从网站读取,单词会通过System.out正常显示,但计算字母的频率不起作用。以下是我的代码:

public class WordCount {

  public static String charStat(String urlString) throws IOException {

/*    List<String> list = new ArrayList<>();
    list.add("word");
    Stream<String> characterStream = list.stream();*/ //每次都能正常工作

    URL url = new URL(urlString);
    Stream<String> characterStream = new BufferedReader(new InputStreamReader(url.openStream())).lines();

    BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
    System.out.println(br.readLine());//仅用于打印单词,以便我可以控制一切是否正常

    int[] charNumber = new int[26];//数组大小为26,因为字母表有26个字符

    Runnable func = () -> {
      characterStream
          .map(String::toLowerCase)
          .flatMapToInt(CharSequence::chars)
          .filter(c -> c != ' ')
          .map(c -> c - (int) 'a')//减去'a'(ascii中的97)以便a位于数组的位置0
          .forEach(i -> {charNumber[i]++;});
    };

    func.run();
    characterStream.close();
    return "a: " + charNumber[0];//返回字母a出现的次数,可以是任何字母
  }

  public static void main(String[] args) throws IOException {//我知道主方法不应抛出异常
    System.out.println(charStat("https://randomeword.azurewebsites.net/api/word"));//从中获取单词的网站
  }
}

失败示例:

单词:

Klavierkonzert

数组:

[1, 0, 0, 0, 3, 0, 2, 0, 1, 0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1]

应该是:

[1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 0, 0, 2, 0, 1, 0, 1, 0, 0, 0, 1]

我不知道为什么会发生这种情况,因为单词通过System.out.println()正确显示。所以如果我做错了什么,请告诉我。

英文:

I'm trying to make a program, which gets a word from a website, which always gives you a random (German) word and counts how frequently a character is in the word. When I try my program with a stream from a list it works fine. If I read from the website, the word is displayed fine with System.out, but counting the letters does not work as intended. Here is my code:

public class WordCount {
public static String charStat(String urlString) throws IOException {
/*    List<String> list = new ArrayList<>();
list.add("word");
Stream<String> characterStream = list.stream();*/ //works totally fine every time
URL url = new URL(urlString);
Stream<String> characterStream = new BufferedReader(new InputStreamReader(url.openStream())).lines();
BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
System.out.println(br.readLine());//BufferedReader only used to print the word so I can control
//if everything is working
int[] charNumber = new int[26];//size is 26 cause the alphabet has 26 characters
Runnable func = () -> {
characterStream
.map(String::toLowerCase)
.flatMapToInt(CharSequence::chars)
.filter(c -> c != ' ')
.map(c -> c - (int) 'a')//subtracting 'a'(97 in ascii) so a is in position 0 of the array
.forEach(i -> {charNumber[i]++;});
};
func.run();
characterStream.close();
return "a: " + charNumber[0];//returning how many times the letter a is present, could be any letter
}
public static void main(String[] args) throws IOException {//Ik that main shouldn't throw an exception
System.out.println(charStat("https://randomeword.azurewebsites.net/api/word"));//the website im
//getting the word from
}
}

Example from a fail:

word:

Klavierkonzert

the array:

[1, 0, 0, 0, 3, 0, 2, 0, 1, 0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1]

the should be:

[1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 0, 0, 2, 0, 1, 0, 1, 0, 0, 0, 1]

I don't know why this is happening as the word gets shown correctly with System.out.println();. So if I've done anything wrong let me know.

答案1

得分: 2

使用两个URL.openStream()调用实际上是对网站进行了两次调用,因此恢复了两个不同的单词。

只需进行一次URL.openStream()调用,然后将结果存储在一个变量中。然后,将该变量用于所有操作,您应该可以实现所需的结果。

英文:

Using two URL.openStream() calls is actually doing two calls to the website, thus recovering two different words.

Do only a single URL.openStream()call, and place the result in a variable. Use that variable for all your operations and you should achieve the desired outcome.

答案2

得分: 1

以下是您要翻译的代码部分:

还有一些要注意的问题

- 德语字母表包含不止 26 个 ASCII 字母
- 德语名词以大写字母开头例如 Ü)。
- 网站以某种编码发送文本本例中为 UTF-8

所以

        String urlString = "https://randomeword.azurewebsites.net/api/word";
        URL url = new URL(urlString);
        URLConnection conn = url.openConnection();
        String contentType = conn.getContentType(); // "text/plain; charset=utf-8"
        String charsetName = !contentType.contains("charset=") ? "UTF-8"
                : contentType.replaceFirst("^.*charset=([^;]*).*$", "$1");
        Charset charset = Charset.forName(charsetName); // 还会检查有效性。
        try (Stream<String> lineStream = new BufferedReader(
                new InputStreamReader(url.openStream(), charset)).lines()) {

            lineStream.findFirst().ifPresent(word -> {
                System.out.println("Word: " + word);
                Map<String, Integer> frequencies
                        = new TreeMap<>(Collator.getInstance(Locale.GERMANY));
                word.codePoints()
                        .mapToObj(Character::toString)
                        .map(s -> s.toLowerCase(Locale.GERMANY))
                        .forEach(s -> frequencies.merge(s, 1, Integer::sum));
                System.out.println("Frequencies: " + frequencies);
            });
        }
    }

请注意,代码中的注释已保留在翻译中。

英文:

Still an answer for some pitfalls:

  • The German alphabet consist of more than 26 ASCII letters.
  • German nouns start with an Uppercase letter (like Ü).
  • The site sends the text in some encoding, this case in UTF-8.

So:

    String urlString = &quot;https://randomeword.azurewebsites.net/api/word&quot;;
URL url = new URL(urlString);
URLConnection conn = url.openConnection();
String contentType = conn.getContentType(); // &quot;text/plain; charset=utf-8&quot;
String charsetName = !contentType.contains(&quot;charset=&quot;) ? &quot;UTF-8&quot;
: contentType.replaceFirst(&quot;^.*charset=([^;]*).*$&quot;, &quot;$1&quot;);
Charset charset = Charset.forName(charsetName); // Also checks validity.
try (Stream&lt;String&gt; lineStream = new BufferedReader(
new InputStreamReader(url.openStream(), charset)).lines()) {
lineStream.findFirst().ifPresent(word -&gt; {
System.out.println(&quot;Word: &quot; + word);
Map&lt;String, Integer&gt; frequencies
= new TreeMap&lt;&gt;(Collator.getInstance(Locale.GERMANY));
word.codePoints()
.mapToObj(Character::toString)
.map(s -&gt; s.toLowerCase(Locale.GERMANY))
.forEach(s -&gt; frequencies.merge(s, 1, Integer::sum));
System.out.println(&quot;Frequencies: &quot; + frequencies);
});
}
}

huangapple
  • 本文由 发表于 2020年7月30日 14:20:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/63167268.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定