英文:
Apache POI doesn't correctly read seconds from Excel cells
问题
我目前正在调试他人的代码库。目标是将Excel文件中的数据导入到数据库中。Excel文件中的每一行都包含在列0中的时间戳和进一步列中的一些标签值。
时间戳包含年、月、日、小时、分钟和秒。为了解析Excel文件并读取单个单元格,使用以下API和代码:
import java.time.LocalDateTime;
import java.time.OffsetDateTime;
import java.time.ZoneOffset;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
...
// 一些循环
LocalDateTime dateCellValue = cell.getLocalDateTimeCellValue();
大多数情况下,日期的解析工作正常,但我发现,并非所有情况都成立。问题是,Java有时会通过纳秒扩展时间戳的秒数。例如,时间戳"12.09.2018 12:39:11"被Java解释为"2018-09-12T12:39:10.995"。问题直接影响数据库:文件中的第11秒被保存为数据库中的第10秒(因此纳秒部分被丢弃)。
为了了解问题的原因,我阅读了以下文档:https://github.com/apache/poi/blob/trunk/src/java/org/apache/poi/hssf/usermodel/HSSFCell.java
从中我得出结论,问题的原因是一个舍入误差:API将每个日期值解释为double。在Excel中,您可以通过将日期时间值转换为文本或数字来查看此双精度值;例如,"20.03.2019 08:36:39"变为"43544,3587847222"。因此,由于某些值的数据类型,舍入就不准确,因此Java无法正确解析该值。
我的问题是,如何快速而准确地修复这个问题。基本上,我考虑了两种可能性:
- 不使用Apache POI,而是通过DateFormatter解析日期。然而,这样做的缺点是,用户定义的日期总是需要具有相同的格式,而它们目前没有。
- 在代码中,我可以尝试四舍五入以更正秒值并去掉纳秒部分。然而,我不确定是否始终需要舍入到上面的下一个秒值(就像上面显示的例子中那样),或者是否有情况(例如纳秒值小于0.5)需要舍入到下面的下一个秒值。我的问题在于,我不完全理解程序在舍入误差方面的行为背后的确切系统。
有人有建议吗?非常感谢帮助!
编辑:
问题的原因是纳秒已经存在于Excel表中,但肉眼无法识别,因为相关的数据类型没有显示它们。
英文:
I am currently debugging someone elses code base. The aim is to import data from Excel files into a database. Each row in the excel file contains a timestamp in column 0 and some label-values in the further columns.
The timestamps contain year, month, day_of_month, hour, minute, and second. To parse the excel files and read individual cells, the following APIs and code is used:
import java.time.LocalDateTime;
import java.time.OffsetDateTime;
import java.time.ZoneOffset;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
...
// some loop
LocalDateTime dateCellValue = cell.getLocalDateTimeCellValue();
The parsing of dates works properly in most cases, but I found out, that this doesn't hold for all cases. The problem is, that Java sometimes extends the seconds of a timestamp by nanoseconds. For example, the timestamp "12.09.2018 12:39:11" is interpreted by Java as "2018-09-12T12:39:10.995". The problem is directly translated into the db: second 11 from the file is being saved as second 10 in the db (so the nanosecond-artifact is gone).
To understand the cause of the problem, I read the following documentation: https://github.com/apache/poi/blob/trunk/src/java/org/apache/poi/hssf/usermodel/HSSFCell.java
From this I concluded, that the cause of my problem is a rounding error: the API interprets each date value as double. In Excel you can see this double value by converting a date time value e.g. into text or numeric; e.g. "20.03.2019 08:36:39" becomes "43544,3587847222". Hence, due to the datatype for certain values the rounding simply is inaccurate and therefore Java doesn't correctly parse the value.
My question is, how the quickes and accurate fix for such a problem could look like. Basically, I thought about 2 possibilities:
- Instead of using the Apache POI I could parse the dates through a DateFormatter instead. This, however, has the disadvantage, that user-defined dates always need to be of the same format, which they are currently not.
- In the code, I could try to round to correct second values and cut of the nanosecond-artifacts. However, I am not sure, if I always need to round to the next second-value above (like in the above shown example) or if there are cases (e.g. nanosecond-values below 0.5) in which I need to round to the next second-value below. My problem here is, that I don't fully understand the exact system behind the programs behavior with regard to the rounding error.
Does someone have a suggestion? Help is very much appreciated!
Edit:
The cause of the problem was, that the nanoseconds were already in the Excel-sheets, but not recognizable for the bare eye, because the related datatype didn't display them.
答案1
得分: 3
我无法重现这个问题。如果 apache poi
得到了 LocalDateTime
2018-09-12T12:39:10.995,那么 Excel
单元格已经存储了这个确切的日期和时间。当然,由于日期格式的舍入,Excel
可能无法完全显示它。例如,日期格式 DD.MM.YYYY hh:mm:ss
将会显示 2018-09-12T12:39:10.995 为 12.09.2018 12:39:11。但实际存储的是确切的日期和时间。
但是,如果只需要以秒为精度获取 LocalDateTime
,可以添加 0.5 秒(500 毫微秒),然后截断为秒。这种方法会将 LocalDateTime
舍入到秒。
LocalDateTime dateCellValue = cell.getLocalDateTimeCellValue(); //直接从Excel获取
dateCellValue = dateCellValue.plusNanos(500000000).truncatedTo(ChronoUnit.SECONDS); //舍入到秒
完整示例:
Excel 表格如下所示:
这里的列 B
中的单元格值是日期时间值。单元格的数字格式为 TT.MM.YYYY hh:mm:ss.000
。
代码:
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import java.time.LocalDateTime;
import java.time.temporal.ChronoUnit;
import java.io.FileInputStream;
class ExcelReadLocalDateTime {
public static void main(String[] args) throws Exception {
//Workbook workbook = WorkbookFactory.create(new FileInputStream("Workbook.xls")); String filePath = "WorkbookNew.xls";
Workbook workbook = WorkbookFactory.create(new FileInputStream("Workbook.xlsx")); String filePath = "WorkbookNew.xlsx";
Sheet sheet = workbook.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
switch (cell.getCellType()) {
case STRING:
System.out.println(cell.getRichStringCellValue().getString());
break;
case NUMERIC:
if (DateUtil.isCellDateFormatted(cell)) {
LocalDateTime dateCellValue = cell.getLocalDateTimeCellValue(); //直接从Excel获取
System.out.println(dateCellValue);
dateCellValue = dateCellValue.plusNanos(500000000).truncatedTo(ChronoUnit.SECONDS); //舍入到秒
System.out.println(dateCellValue);
} else {
System.out.println(cell.getNumericCellValue());
}
break;
default:
System.out.println();
}
}
}
workbook.close();
}
}
结果:
Text
DateTime
DT 1
2018-09-12T12:39:10
2018-09-12T12:39:10
DT 2
2018-09-12T12:39:10.123
2018-09-12T12:39:10
DT 3
2018-09-12T12:39:10.245
2018-09-12T12:39:10
DT 4
2018-09-12T12:39:10.370
2018-09-12T12:39:10
DT 5
2018-09-12T12:39:10.495
2018-09-12T12:39:10
DT 6
2018-09-12T12:39:10.500
2018-09-12T12:39:11
DT 7
2018-09-12T12:39:10.620
2018-09-12T12:39:11
DT 8
2018-09-12T12:39:10.745
2018-09-12T12:39:11
DT 9
2018-09-12T12:39:10.870
2018-09-12T12:39:11
DT 10
2018-09-12T12:39:10.995
2018-09-12T12:39:11
英文:
I cannot reproduce the issue. If apache poi
gets LocalDateTime
2018-09-12T12:39:10.995 then the Excel
cell has stored that exact date time. Of curse Excel
might not show it exactly because of a date format which rounds already. For example date format DD.MM.YYYY hh:mm:ss
will show 12.09.2018 12:39:11 for 2018-09-12T12:39:10.995. But stored is the exact date time.
But if the need is getting the LocalDateTime
only in accuracy of seconds, one could add 0.5 seconds (500 thousandths of a second) and then truncate to seconds. That approach will round the LocalDateTime
to seconds.
LocalDateTime dateCellValue = cell.getLocalDateTimeCellValue(); //got directly from Excel
dateCellValue = dateCellValue.plusNanos(500000000).truncatedTo(ChronoUnit.SECONDS); //round to seconds
Complete example:
Excel sheet looks like:
Cell values in column B
here are date time values. Cell number format is TT.MM.YYYY hh:mm:ss.000
.
Code:
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import java.time.LocalDateTime;
import java.time.temporal.ChronoUnit;
import java.io.FileInputStream;
class ExcelReadLocalDateTime {
public static void main(String[] args) throws Exception {
//Workbook workbook = WorkbookFactory.create(new FileInputStream("Workbook.xls")); String filePath = "WorkbookNew.xls";
Workbook workbook = WorkbookFactory.create(new FileInputStream("Workbook.xlsx")); String filePath = "WorkbookNew.xlsx";
Sheet sheet = workbook.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
switch (cell.getCellType()) {
case STRING:
System.out.println(cell.getRichStringCellValue().getString());
break;
case NUMERIC:
if (DateUtil.isCellDateFormatted(cell)) {
LocalDateTime dateCellValue = cell.getLocalDateTimeCellValue(); //got directly from Excel
System.out.println(dateCellValue);
dateCellValue = dateCellValue.plusNanos(500000000).truncatedTo(ChronoUnit.SECONDS); //round to seconds
System.out.println(dateCellValue);
} else {
System.out.println(cell.getNumericCellValue());
}
break;
default:
System.out.println();
}
}
}
workbook.close();
}
}
Result:
Text
DateTime
DT 1
2018-09-12T12:39:10
2018-09-12T12:39:10
DT 2
2018-09-12T12:39:10.123
2018-09-12T12:39:10
DT 3
2018-09-12T12:39:10.245
2018-09-12T12:39:10
DT 4
2018-09-12T12:39:10.370
2018-09-12T12:39:10
DT 5
2018-09-12T12:39:10.495
2018-09-12T12:39:10
DT 6
2018-09-12T12:39:10.500
2018-09-12T12:39:11
DT 7
2018-09-12T12:39:10.620
2018-09-12T12:39:11
DT 8
2018-09-12T12:39:10.745
2018-09-12T12:39:11
DT 9
2018-09-12T12:39:10.870
2018-09-12T12:39:11
DT 10
2018-09-12T12:39:10.995
2018-09-12T12:39:11
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论