英文:
Counting characters, a Java program and wc yield inconsistent results
问题
以下是翻译好的内容:
我编写了一个Java程序,用于计算文件中的字符数。为了检查程序是否正常工作,我在命令行(Linux)中键入以下内容,以检查字符数:
wc -m 文件名
从wc
的man页面中,我知道换行符包含在计数中。
以下是我的Java程序:
import java.io.IOException;
import java.io.File;
import java.util.Scanner;
public class NumOfChars {
/** 主方法 */
public static void main(String[] args) throws IOException {
// 检查命令是否输入正确
if (args.length != 1) {
System.out.println("用法:java NumOfChars 文件名");
}
// 检查源文件是否存在
File file = new File(args[0]);
if (!file.exists()) {
System.out.printf("文件%s不存在\n", file);
}
// 创建Scanner对象
Scanner input = new Scanner(file);
int characters = 0;
while (input.hasNext()) {
String line = input.nextLine();
// 字符数是行的长度加上换行符
characters += line.length() + 1;
}
input.close();
// 打印结果
System.out.printf("文件%s包含\n", args[0]);
System.out.printf("%d个字符\n", characters);
}
}
我遇到的问题是,有时候使用Java程序报告的字符数与使用wc
命令得到的字符数不同。
这里有两个例子:
一个正常工作的例子。文件text.txt
的内容是:
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
这是一些文本
命令wc -m text.txt
告诉我这个文件有144个字符。这是好的,因为当我执行Java程序java NumOfChars text.txt
时,我也得到文件有144个字符的信息。
一个不正常工作的例子。文件Exercise06.java
的内容是:
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/** 将十六进制转换为十进制 */
public class Exercise06 {
/** 主方法 */
public static void main(String[] args) {
// 创建Scanner
Scanner input = new Scanner(System.in);
// 提示用户输入一个字符串
System.out.print("输入一个十六进制数:");
String hex = input.nextLine();
// 显示结果
System.out.println("十六进制数" + hex + "的十进制值是" + hexToDecimal(hex.toUpperCase()));
}
/** 将十六进制转换为十进制
@param hex 十六进制数
@return 十进制值
@throws NumberFormatException 如果hex不是十六进制
*/
public static int hexToDecimal(String hex) throws NumberFormatException {
// 检查hex是否为十六进制。如果不是,抛出异常。
boolean patternMatch = Pattern.matches("[0-9A-F]+", hex);
if (!patternMatch)
throw new NumberFormatException();
// 将hex转换为十进制
int decimalValue = 0;
for (int i = 0; i < hex.length(); i++) {
char hexChar = hex.charAt(i);
decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
}
// 返回十进制值
return decimalValue;
}
/** 将十六进制字符转换为十进制
@param ch 十六进制字符
@return 十进制值
*/
public static int hexCharToDecimal(char ch) {
if (ch >= 'A' && ch <= 'F')
return 10 + ch - 'A';
else // ch是'0','1',...,或'9'
return ch - '0';
}
}
命令wc -m Exercise06.java
告诉我这个文件有1650个字符。然而,当我执行Java程序java NumOfChars Exercise06.java
时,我被告知这个文件有1596个字符。
我似乎无法弄清楚我做错了什么。有人可以给我一些建议吗?
编辑:在输入head -5 Exercise06.java | od -c
后,我得到以下结果:
英文:
I wrote a java program that counts the number of characters in a file. To check that the program is working correctly, I type this into the command line (linux) to check the number of characters:
wc -m fileName
from the man page for wc
, I know that the newline character is included in the count.
Here is my java program:
import java.io.IOException;
import java.io.File;
import java.util.Scanner;
public class NumOfChars {
/** The main method. */
public static void main(String[] args) throws IOException {
// Check that command is entered correctly
if (args.length != 1) {
System.out.println("Usage: java NumOfChars fileName");
}
// Check that source file exists
File file = new File(args[0]);
if (!file.exists()) {
System.out.printf("File %s does not exist\n", file);
}
// Create Scanner object
Scanner input = new Scanner(file);
int characters = 0;
while (input.hasNext()) {
String line = input.nextLine();
// The number of characters is the length of the line plus the newline character
characters += line.length() + 1;
}
input.close();
// Print results
System.out.printf("File %s has\n", args[0]);
System.out.printf("%d characters\n", characters);
}
}
The issue I'm having is that sometimes the number of characters reported from using the java program is different from the number I get when using the wc
command.
Here are two examples:
One that works. The contents of the file text.txt
is
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
This is some text
The command wc -m text.txt
tells me that this file has 144 characters. This is good because when I execute the java program java NumOfChars text.txt
, I am also told that the file has 144 characters.
One that doesn't work. The contents of file Exercise06.java
is
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/** Converts a hexadecimal to a decimal. */
public class Exercise06 {
/** Main method */
public static void main(String[] args) {
// Create a Scanner
Scanner input = new Scanner(System.in);
// Prompt the user to enter a string
System.out.print("Enter a hex number: ");
String hex = input.nextLine();
// Display result
System.out.println("The decimal value for hex number "
+ hex + " is " + hexToDecimal(hex.toUpperCase()));
}
/** Converts hexadecimal to decimal.
@param hex The hexadecimal
@return The deciaml value of hex
@throws NumberFormatException if hex is not a hexadecimal
*/
public static int hexToDecimal(String hex) throws NumberFormatException {
// Check if hex is a hexadecimal. Throw Exception if not.
boolean patternMatch = Pattern.matches("[0-9A-F]+", hex);
if (!patternMatch)
throw new NumberFormatException();
// Convert hex to a decimal
int decimalValue = 0;
for (int i = 0; i < hex.length(); i++) {
char hexChar = hex.charAt(i);
decimalValue = decimalValue * 16 + hexCharToDecimal(hexChar);
}
// Return the decimal
return decimalValue;
}
/** Converts a hexadecimal Char to a deciaml.
@param ch The hexadecimal Char
@return The decimal value of ch
*/
public static int hexCharToDecimal(char ch) {
if (ch >= 'A' && ch <= 'F')
return 10 + ch - 'A';
else // ch is '0', '1', ..., or '9'
return ch - '0';
}
}
The command wc -m Exercise06.java
tells me that this file has 1650 characters. However, when I execute the java program java NumOfChars Exercise06.java
, I am told that the file has 1596 characters.
I can't seem to figure out what I'm doing wrong. Can anyone provide me with some feedback?
**EDIT: Here is what I get when typing in head -5 Exercise06.java | od -c
答案1
得分: 4
有几种可能的解释:
-
可能每行都以多于一个字符结尾,例如在Windows中,每行都以CR + LF结尾,而您的程序始终计算精确的1个行结束字符。
-
wc
可能使用不同的字符编码,与您的程序不同,这可能导致多字节字符的字符计数不同。
英文:
There are several possible explanations:
-
It is possible that each line ends with more than one character, for example on Windows each line ends with CR + LF, whereas your program always counts exactly 1 line ending character.
-
wc
may assume a different character encoding than your program, possibly leading to different character counts for multi-byte characters.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论