2020年8月17日 14:55:29go评论101阅读模式

英文:

Counting characters, a Java program and wc yield inconsistent results

问题

以下是翻译好的内容：

我编写了一个Java程序，用于计算文件中的字符数。为了检查程序是否正常工作，我在命令行（Linux）中键入以下内容，以检查字符数：

wc -m 文件名

从wc的man页面中，我知道换行符包含在计数中。

以下是我的Java程序：

import java.io.IOException;
import java.io.File;
import java.util.Scanner;
public class NumOfChars {
  /** 主方法 */
  public static void main(String[] args) throws IOException {
    // 检查命令是否输入正确
    if (args.length != 1) {
      System.out.println("用法：java NumOfChars 文件名");
    }
    // 检查源文件是否存在
    File file = new File(args[0]);
    if (!file.exists()) {
      System.out.printf("文件%s不存在\n", file);
    }
    // 创建Scanner对象
    Scanner input = new Scanner(file);
    int characters = 0;
    while (input.hasNext()) {
      
      String line = input.nextLine();
      // 字符数是行的长度加上换行符
      characters += line.length() + 1;
    }
    input.close();
    // 打印结果
    System.out.printf("文件%s包含\n", args[0]);
    System.out.printf("%d个字符\n", characters);
  }
}

我遇到的问题是，有时候使用Java程序报告的字符数与使用wc命令得到的字符数不同。

这里有两个例子：

一个正常工作的例子。文件text.txt的内容是：

这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本

命令wc -m text.txt告诉我这个文件有144个字符。这是好的，因为当我执行Java程序java NumOfChars text.txt时，我也得到文件有144个字符的信息。

一个不正常工作的例子。文件Exercise06.java的内容是：

import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/** 将十六进制转换为十进制 */
public class Exercise06 {
  /** 主方法 */
  public static void main(String[] args) {
    // 创建Scanner
    Scanner input = new Scanner(System.in);
    // 提示用户输入一个字符串
    System.out.print("输入一个十六进制数：");
    String hex = input.nextLine();
    
    // 显示结果
    System.out.println("十六进制数" + hex + "的十进制值是" + hexToDecimal(hex.toUpperCase()));
  }
  
  /** 将十六进制转换为十进制
      @param hex 十六进制数
      @return 十进制值
      @throws NumberFormatException 如果hex不是十六进制
    */
  public static int hexToDecimal(String hex) throws NumberFormatException {
    // 检查hex是否为十六进制。如果不是，抛出异常。
    boolean patternMatch = Pattern.matches("[0-9A-F]+", hex);
    if (!patternMatch) 
      throw new NumberFormatException();
    // 将hex转换为十进制
    int decimalValue = 0;
    for (int i = 0; i < hex.length(); i++) {
      char hexChar = hex.charAt(i);
      decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
    }
    // 返回十进制值
    return decimalValue;
  }
  
  /** 将十六进制字符转换为十进制
      @param ch 十六进制字符
      @return 十进制值
    */
  public static int hexCharToDecimal(char ch) {
    if (ch >= 'A' && ch <= 'F')
      return 10 + ch - 'A';
    else // ch是'0'，'1'，...，或'9'
      return ch - '0';
  }
}

命令wc -m Exercise06.java告诉我这个文件有1650个字符。然而，当我执行Java程序java NumOfChars Exercise06.java时，我被告知这个文件有1596个字符。

我似乎无法弄清楚我做错了什么。有人可以给我一些建议吗？

编辑：在输入head -5 Exercise06.java | od -c后，我得到以下结果：

英文:

I wrote a java program that counts the number of characters in a file. To check that the program is working correctly, I type this into the command line (linux) to check the number of characters:

wc -m fileName

from the man page for wc, I know that the newline character is included in the count.

Here is my java program:

import java.io.IOException;
import java.io.File;
import java.util.Scanner;
public class NumOfChars {
/** The main method. */
public static void main(String[] args) throws IOException {
// Check that command is entered correctly
if (args.length != 1) {
System.out.println(&quot;Usage: java NumOfChars fileName&quot;);
}
// Check that source file exists
File file = new File(args[0]);
if (!file.exists()) {
System.out.printf(&quot;File %s does not exist\n&quot;, file);
}
// Create Scanner object
Scanner input = new Scanner(file);
int characters = 0;
while (input.hasNext()) {
String line = input.nextLine();
// The number of characters is the length of the line plus the newline character
characters += line.length() + 1;
}
input.close();
// Print results
System.out.printf(&quot;File %s has\n&quot;, args[0]);
System.out.printf(&quot;%d characters\n&quot;, characters);
}
}

The issue I'm having is that sometimes the number of characters reported from using the java program is different from the number I get when using the wc command.

Here are two examples:

One that works. The contents of the file text.txt is

This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text

The command wc -m text.txt tells me that this file has 144 characters. This is good because when I execute the java program java NumOfChars text.txt, I am also told that the file has 144 characters.

One that doesn't work. The contents of file Exercise06.java is

import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/** Converts a hexadecimal to a decimal. */
public class Exercise06 {
/** Main method */
public static void main(String[] args) {
// Create a Scanner
Scanner input = new Scanner(System.in);
// Prompt the user to enter a string
System.out.print(&quot;Enter a hex number: &quot;);
String hex = input.nextLine();
// Display result
System.out.println(&quot;The decimal value for hex number &quot;
+ hex + &quot; is &quot; + hexToDecimal(hex.toUpperCase()));
}
/** Converts hexadecimal to decimal.
@param hex The hexadecimal
@return The deciaml value of hex
@throws NumberFormatException if hex is not a hexadecimal
*/
public static int hexToDecimal(String hex) throws NumberFormatException {
// Check if hex is a hexadecimal. Throw Exception if not.
boolean patternMatch = Pattern.matches(&quot;[0-9A-F]+&quot;, hex);
if (!patternMatch) 
throw new NumberFormatException();
// Convert hex to a decimal
int decimalValue = 0;
for (int i = 0; i &lt; hex.length(); i++) {
char hexChar = hex.charAt(i);
decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
}
// Return the decimal
return decimalValue;
}
/** Converts a hexadecimal Char to a deciaml.
@param ch The hexadecimal Char
@return The decimal value of ch
*/
public static int hexCharToDecimal(char ch) {
if (ch &gt;= &#39;A&#39; &amp;&amp; ch &lt;= &#39;F&#39;)
return 10 + ch - &#39;A&#39;;
else // ch is &#39;0&#39;, &#39;1&#39;, ..., or &#39;9&#39;
return ch - &#39;0&#39;;
}
}

The command wc -m Exercise06.java tells me that this file has 1650 characters. However, when I execute the java program java NumOfChars Exercise06.java, I am told that the file has 1596 characters.

I can't seem to figure out what I'm doing wrong. Can anyone provide me with some feedback?

**EDIT: Here is what I get when typing in head -5 Exercise06.java | od -c

答案1

得分: 4

有几种可能的解释：

可能每行都以多于一个字符结尾，例如在Windows中，每行都以CR + LF结尾，而您的程序始终计算精确的1个行结束字符。
wc 可能使用不同的字符编码，与您的程序不同，这可能导致多字节字符的字符计数不同。

英文:

There are several possible explanations:

It is possible that each line ends with more than one character, for example on Windows each line ends with CR + LF, whereas your program always counts exactly 1 line ending character.
wc may assume a different character encoding than your program, possibly leading to different character counts for multi-byte characters.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算字符数，一个Java程序和wc命令产生不一致的结果。

问题

答案1

有没有办法在Spring Boot中使用JPA保存时获取一个“刷新后”的已保存实体？

Sort ArrayList containing numbers and letters

这个Lambda表达式在Java中如何帮助进行排序？帮我理解一下。

一个对象如何被移动到一个新的点？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。