使用OpenCSV读取CSV文件时,在第一行和第一个字符处出现特殊字符。

huangapple go评论58阅读模式
英文:

Getting special character while reading the CSV file using OpenCSV for the first line and first character

问题

I am having CSV file, need to process it. While reading the CSV file, for the first line and first character I am seeing an unknown special character. Not sure why getting this and how to resolve.

Here is the code snippet I am using,

CSVReader reportTypesReader = new CSVReader(new FileReader(Paths.get(filePath.concat("/ReportTypes.csv")).toFile()));

String[] nextLine;
// reads one line at a time
while ((nextLine = reportTypesReader.readNext()) != null) {
for (String token : nextLine) {
System.out.print(token);
}
System.out.print("\n");
}

Here is the sample output,

  Report Type, Icon URL

My expectation is,

Report Type, Icon URL
英文:

I am having CSV file, need to process it. While reading the CSV file, for the first line and first character I am seeing an unknown special character. Not sure why getting this and how to resolve.

Here is the code snippet I am using,

CSVReader reportTypesReader = new CSVReader(new FileReader(Paths.get(filePath.concat("/ReportTypes.csv")).toFile()));

String\[\] nextLine;
// reads one line at a time
while ((nextLine = reportTypesReader.readNext()) != null) {
for (String token : nextLine) {
System.out.print(token);
}
System.out.print("\\n");
}

Here is the sample output,

  Report Type, Icon URL

My expectation is,

Report Type, Icon URL

答案1

得分: 2

BOM

是的,这是一个BOM(字节顺序标记)

 是分配给构成UTF-8中BOM的三个八位字节的字符,如果被错误解释为代码点的话。用十进制表示,这三个八位字节分别是239、187、191。用十六进制表示,是EF BB BF。

你可以尝试使用BOMInputStream - https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html

CSV Reader必须有接受InputStream的签名。
你可以使用
new BOMInputStream(new FileInputStream(File))
这将创建一个包装在FileInputStream对象上的包装器,BOMInputStream将确保你获取到不带BOM的内容。
这应该适用于任何类型的文件,无论是否包含BOM。

英文:

BOM

Yes, that is a BOM (Byte Order Mark).

 are the characters assigned to the three octets that make up a BOM in UTF-8, if misinterpreted as code points. In decimal, those three octets are 239 187 191. In hex, EF BB BF.

You can try using BOMInputStream - https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html

There must be some signature of the CSV Reader that accepts InputStream.
You can use
new BOMInputStream(new FileInputStream(File))
This will create a wrapper over FileInputStream object, and BOMInputStream will make sure you get the content without the BOM.
This should work with any kind of file, with or without BOM in it.

huangapple
  • 本文由 发表于 2023年4月17日 21:49:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035876.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定