英文:
Getting special character while reading the CSV file using OpenCSV for the first line and first character
问题
I am having CSV file, need to process it. While reading the CSV file, for the first line and first character I am seeing an unknown special character. Not sure why getting this and how to resolve.
Here is the code snippet I am using,
CSVReader reportTypesReader = new CSVReader(new FileReader(Paths.get(filePath.concat("/ReportTypes.csv")).toFile()));
String[] nextLine;
// reads one line at a time
while ((nextLine = reportTypesReader.readNext()) != null) {
for (String token : nextLine) {
System.out.print(token);
}
System.out.print("\n");
}
Here is the sample output,
Report Type, Icon URL
My expectation is,
Report Type, Icon URL
英文:
I am having CSV file, need to process it. While reading the CSV file, for the first line and first character I am seeing an unknown special character. Not sure why getting this and how to resolve.
Here is the code snippet I am using,
CSVReader reportTypesReader = new CSVReader(new FileReader(Paths.get(filePath.concat("/ReportTypes.csv")).toFile()));
String\[\] nextLine;
// reads one line at a time
while ((nextLine = reportTypesReader.readNext()) != null) {
for (String token : nextLine) {
System.out.print(token);
}
System.out.print("\\n");
}
Here is the sample output,
Report Type, Icon URL
My expectation is,
Report Type, Icon URL
答案1
得分: 2
BOM
是的,这是一个BOM(字节顺序标记)。

是分配给构成UTF-8中BOM的三个八位字节的字符,如果被错误解释为代码点的话。用十进制表示,这三个八位字节分别是239、187、191。用十六进制表示,是EF BB BF。
你可以尝试使用BOMInputStream - https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html
CSV Reader必须有接受InputStream的签名。
你可以使用
new BOMInputStream(new FileInputStream(File))
这将创建一个包装在FileInputStream对象上的包装器,BOMInputStream将确保你获取到不带BOM的内容。
这应该适用于任何类型的文件,无论是否包含BOM。
英文:
BOM
Yes, that is a BOM (Byte Order Mark).

are the characters assigned to the three octets that make up a BOM in UTF-8, if misinterpreted as code points. In decimal, those three octets are 239 187 191. In hex, EF BB BF.
You can try using BOMInputStream - https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html
There must be some signature of the CSV Reader that accepts InputStream.
You can use
new BOMInputStream(new FileInputStream(File))
This will create a wrapper over FileInputStream object, and BOMInputStream will make sure you get the content without the BOM.
This should work with any kind of file, with or without BOM in it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论