计算字符数,一个Java程序和wc命令产生不一致的结果。

huangapple go评论70阅读模式
英文:

Counting characters, a Java program and wc yield inconsistent results

问题

以下是翻译好的内容:

我编写了一个Java程序,用于计算文件中的字符数。为了检查程序是否正常工作,我在命令行(Linux)中键入以下内容,以检查字符数:

wc -m 文件名

wc的man页面中,我知道换行符包含在计数中。

以下是我的Java程序:

import java.io.IOException;
import java.io.File;
import java.util.Scanner;

public class NumOfChars {
  /** 主方法 */
  public static void main(String[] args) throws IOException {
    // 检查命令是否输入正确
    if (args.length != 1) {
      System.out.println("用法:java NumOfChars 文件名");
    }

    // 检查源文件是否存在
    File file = new File(args[0]);
    if (!file.exists()) {
      System.out.printf("文件%s不存在\n", file);
    }

    // 创建Scanner对象
    Scanner input = new Scanner(file);

    int characters = 0;
    while (input.hasNext()) {
      
      String line = input.nextLine();

      // 字符数是行的长度加上换行符
      characters += line.length() + 1;
    }
    input.close();

    // 打印结果
    System.out.printf("文件%s包含\n", args[0]);
    System.out.printf("%d个字符\n", characters);
  }
}

我遇到的问题是,有时候使用Java程序报告的字符数与使用wc命令得到的字符数不同。

这里有两个例子:

一个正常工作的例子。文件text.txt的内容是:

这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本

命令wc -m text.txt告诉我这个文件有144个字符。这是好的,因为当我执行Java程序java NumOfChars text.txt时,我也得到文件有144个字符的信息。

一个不正常工作的例子。文件Exercise06.java的内容是:

import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

/** 将十六进制转换为十进制 */
public class Exercise06 {
  /** 主方法 */
  public static void main(String[] args) {
    // 创建Scanner
    Scanner input = new Scanner(System.in);

    // 提示用户输入一个字符串
    System.out.print("输入一个十六进制数:");
    String hex = input.nextLine();
    
    // 显示结果
    System.out.println("十六进制数" + hex + "的十进制值是" + hexToDecimal(hex.toUpperCase()));
  }
  

  /** 将十六进制转换为十进制
      @param hex 十六进制数
      @return 十进制值
      @throws NumberFormatException 如果hex不是十六进制
    */
  public static int hexToDecimal(String hex) throws NumberFormatException {
    // 检查hex是否为十六进制。如果不是,抛出异常。
    boolean patternMatch = Pattern.matches("[0-9A-F]+", hex);
    if (!patternMatch) 
      throw new NumberFormatException();

    // 将hex转换为十进制
    int decimalValue = 0;
    for (int i = 0; i < hex.length(); i++) {
      char hexChar = hex.charAt(i);
      decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
    }
    // 返回十进制值
    return decimalValue;
  }
  
  /** 将十六进制字符转换为十进制
      @param ch 十六进制字符
      @return 十进制值
    */
  public static int hexCharToDecimal(char ch) {
    if (ch >= 'A' && ch <= 'F')
      return 10 + ch - 'A';
    else // ch是'0','1',...,或'9'
      return ch - '0';
  }
}

命令wc -m Exercise06.java告诉我这个文件有1650个字符。然而,当我执行Java程序java NumOfChars Exercise06.java时,我被告知这个文件有1596个字符。

我似乎无法弄清楚我做错了什么。有人可以给我一些建议吗?

编辑:在输入head -5 Exercise06.java | od -c后,我得到以下结果:

计算字符数,一个Java程序和wc命令产生不一致的结果。

英文:

I wrote a java program that counts the number of characters in a file. To check that the program is working correctly, I type this into the command line (linux) to check the number of characters:

wc -m fileName

from the man page for wc, I know that the newline character is included in the count.

Here is my java program:

import java.io.IOException;
import java.io.File;
import java.util.Scanner;
public class NumOfChars {
/** The main method. */
public static void main(String[] args) throws IOException {
// Check that command is entered correctly
if (args.length != 1) {
System.out.println(&quot;Usage: java NumOfChars fileName&quot;);
}
// Check that source file exists
File file = new File(args[0]);
if (!file.exists()) {
System.out.printf(&quot;File %s does not exist\n&quot;, file);
}
// Create Scanner object
Scanner input = new Scanner(file);
int characters = 0;
while (input.hasNext()) {
String line = input.nextLine();
// The number of characters is the length of the line plus the newline character
characters += line.length() + 1;
}
input.close();
// Print results
System.out.printf(&quot;File %s has\n&quot;, args[0]);
System.out.printf(&quot;%d characters\n&quot;, characters);
}
}

The issue I'm having is that sometimes the number of characters reported from using the java program is different from the number I get when using the wc command.

Here are two examples:

One that works. The contents of the file text.txt is

This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text

The command wc -m text.txt tells me that this file has 144 characters. This is good because when I execute the java program java NumOfChars text.txt, I am also told that the file has 144 characters.

One that doesn't work. The contents of file Exercise06.java is

import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/** Converts a hexadecimal to a decimal. */
public class Exercise06 {
/** Main method */
public static void main(String[] args) {
// Create a Scanner
Scanner input = new Scanner(System.in);
// Prompt the user to enter a string
System.out.print(&quot;Enter a hex number: &quot;);
String hex = input.nextLine();
// Display result
System.out.println(&quot;The decimal value for hex number &quot;
+ hex + &quot; is &quot; + hexToDecimal(hex.toUpperCase()));
}
/** Converts hexadecimal to decimal.
@param hex The hexadecimal
@return The deciaml value of hex
@throws NumberFormatException if hex is not a hexadecimal
*/
public static int hexToDecimal(String hex) throws NumberFormatException {
// Check if hex is a hexadecimal. Throw Exception if not.
boolean patternMatch = Pattern.matches(&quot;[0-9A-F]+&quot;, hex);
if (!patternMatch) 
throw new NumberFormatException();
// Convert hex to a decimal
int decimalValue = 0;
for (int i = 0; i &lt; hex.length(); i++) {
char hexChar = hex.charAt(i);
decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
}
// Return the decimal
return decimalValue;
}
/** Converts a hexadecimal Char to a deciaml.
@param ch The hexadecimal Char
@return The decimal value of ch
*/
public static int hexCharToDecimal(char ch) {
if (ch &gt;= &#39;A&#39; &amp;&amp; ch &lt;= &#39;F&#39;)
return 10 + ch - &#39;A&#39;;
else // ch is &#39;0&#39;, &#39;1&#39;, ..., or &#39;9&#39;
return ch - &#39;0&#39;;
}
}

The command wc -m Exercise06.java tells me that this file has 1650 characters. However, when I execute the java program java NumOfChars Exercise06.java, I am told that the file has 1596 characters.

I can't seem to figure out what I'm doing wrong. Can anyone provide me with some feedback?

**EDIT: Here is what I get when typing in head -5 Exercise06.java | od -c
计算字符数,一个Java程序和wc命令产生不一致的结果。

答案1

得分: 4

有几种可能的解释:

  • 可能每行都以多于一个字符结尾,例如在Windows中,每行都以CR + LF结尾,而您的程序始终计算精确的1个行结束字符。

  • wc 可能使用不同的字符编码,与您的程序不同,这可能导致多字节字符的字符计数不同。

英文:

There are several possible explanations:

  • It is possible that each line ends with more than one character, for example on Windows each line ends with CR + LF, whereas your program always counts exactly 1 line ending character.

  • wc may assume a different character encoding than your program, possibly leading to different character counts for multi-byte characters.

huangapple
  • 本文由 发表于 2020年8月17日 14:55:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/63445961.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定