英文:
Apache POI - Parsing Dates with HSSFListener
问题
First off, I'm working on using Apache POI's Event API using HSSFListener and XSSFSheetXMLHandler.SheetContentsHandler to read large files. So I'm not using a workbook where I can use some of the handy methods. I've already implemented it with that approach and got OOM errors.
For XLS files, I'm having trouble figuring out how to determine from a NumberRecord if the format is a Date format or Number format. I've looked at this old link (https://stackoverflow.com/questions/14835339/using-apache-poi-hssflistener-how-to-identify-date-type) but only the Tika link works as the other no longer exists. It seems some modifications would work with the FormatTrackingHSSFListener but I'm missing something there.
I saw some people mentioning using the DateUtil to check if the value is in an InternalDateFormat like below, but that didn't work.
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
if (DateUtil.isInternalDateFormat(numrec.getXFIndex())) {
System.out.println("Cell found with date value " + DateUtil.getJavaDate(numrec.getValue())
+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
} else {
System.out.println("Cell found with value " + numrec.getValue()
+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
}
I wonder if the isADateFormat method would work (https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DateUtil.html#isADateFormat-int-java.lang.String-) but I can't figure out what the formatString would be.
Thoughts? Much appreciated. The documentation is somewhat there but it's "fun" to get thru.
英文:
First off, I'm working on using Apache POI's Event API using HSSFListener and XSSFSheetXMLHandler.SheetContentsHandler to read large files. So I'm not using a workbook where I can use some of the handy methods. I've already implemented it with that approach and got OOM errors.
For XLS files, I'm having trouble figuring out how to determine from a NumberRecord if the format is a Date format or Number format. I've looked at this old link (https://stackoverflow.com/questions/14835339/using-apache-poi-hssflistener-how-to-identify-date-type) but only the Tika link works as the other no longer exists. It seems some modifications would work with the FormatTrackingHSSFListener but I'm missing something there.
I saw some people mentioning using the DateUtil to check if the value is in an InternalDateFormat like below, but that didn't work.
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
if (DateUtil.isInternalDateFormat(numrec.getXFIndex())) {
System.out.println("Cell found with date value " + DateUtil.getJavaDate(numrec.getValue())
+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
} else {
System.out.println("Cell found with value " + numrec.getValue()
+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
}
I wonder if the isADateFormat method would work (https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DateUtil.html#isADateFormat-int-java.lang.String-) but I can't figure out what the formatString would be.
Thoughts? Much appreciated. The documentation is somewhat there but it's "fun" to get thru.
答案1
得分: 0
以下是您要翻译的内容:
如果其他人遇到类似问题,我已经通过此处的TIKA提取器链接解决了这个问题。我建议实现一个HSSFListener,它本身具有FormatTrackingHSSFListener作为成员变量。这将是您的初始侦听器,并将捕获所有格式(如果我理解正确的话),它本身将具有您实现的子侦听器。
public class YourListener implements HSSFListener {
private SSTRecord sstRecord;
private FormatTrackingHSSFListener formatListener;
private List<String> sheetNames = new ArrayList<>();
private Integer currentSheetIndex = -1;
private Integer rowCount = 0;
public YourListener() {
this.formatListener = new FormatTrackingHSSFListener(this);
}
}
当您覆盖processRecord时,您的NumberRecord.sid的switch语句可以调用formatListener的formatNumberDateCell。所以类似这样:
case NumberRecord.sid: // 包含数字单元格值
NumberRecord number = (NumberRecord) record;
System.out.println(record, formatListener.formatNumberDateCell(number));
break;
英文:
In case anyone else has issues with this, I've worked through this with the help from the TIKA extractor link in the other question. I would suggest implementing an HSSFListener which itself has a FormatTrackingHSSFListener as a member variable. This will your initial listener and will capture all the formats (if I understand correctly), and it itself will have a child listener of your implementation.
public class YourListener implements HSSFListener {
private SSTRecord sstRecord;
private FormatTrackingHSSFListener formatListener;
private List<String> sheetNames = new ArrayList<>();
private Integer currentSheetIndex = -1;
private Integer rowCount = 0;
public YourListener() {
this.formatListener = new FormatTrackingHSSFListener(this);
}
}
When you override processRecord, your switch for the NumberRecord.sid can call the formatListener's formatNumberDateCell. So something like:
case NumberRecord.sid: // Contains a numeric cell value
NumberRecord number = (NumberRecord) record;
System.out.println(record, formatListener.formatNumberDateCell(number));
break;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论