Apache POI – 使用HSSFListener解析日期

huangapple go评论88阅读模式
英文:

Apache POI - Parsing Dates with HSSFListener

问题

First off, I'm working on using Apache POI's Event API using HSSFListener and XSSFSheetXMLHandler.SheetContentsHandler to read large files. So I'm not using a workbook where I can use some of the handy methods. I've already implemented it with that approach and got OOM errors.

For XLS files, I'm having trouble figuring out how to determine from a NumberRecord if the format is a Date format or Number format. I've looked at this old link (https://stackoverflow.com/questions/14835339/using-apache-poi-hssflistener-how-to-identify-date-type) but only the Tika link works as the other no longer exists. It seems some modifications would work with the FormatTrackingHSSFListener but I'm missing something there.

I saw some people mentioning using the DateUtil to check if the value is in an InternalDateFormat like below, but that didn't work.

case NumberRecord.sid:
    NumberRecord numrec = (NumberRecord) record;
    if (DateUtil.isInternalDateFormat(numrec.getXFIndex())) {
        System.out.println("Cell found with date value " + DateUtil.getJavaDate(numrec.getValue())
                + " at row " + numrec.getRow() + " and column " + numrec.getColumn());
    } else {
        System.out.println("Cell found with value " + numrec.getValue()
                + " at row " + numrec.getRow() + " and column " + numrec.getColumn());
    }

I wonder if the isADateFormat method would work (https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DateUtil.html#isADateFormat-int-java.lang.String-) but I can't figure out what the formatString would be.

Thoughts? Much appreciated. The documentation is somewhat there but it's "fun" to get thru.

英文:

First off, I'm working on using Apache POI's Event API using HSSFListener and XSSFSheetXMLHandler.SheetContentsHandler to read large files. So I'm not using a workbook where I can use some of the handy methods. I've already implemented it with that approach and got OOM errors.

For XLS files, I'm having trouble figuring out how to determine from a NumberRecord if the format is a Date format or Number format. I've looked at this old link (https://stackoverflow.com/questions/14835339/using-apache-poi-hssflistener-how-to-identify-date-type) but only the Tika link works as the other no longer exists. It seems some modifications would work with the FormatTrackingHSSFListener but I'm missing something there.

I saw some people mentioning using the DateUtil to check if the value is in an InternalDateFormat like below, but that didn't work.

case NumberRecord.sid:
	NumberRecord numrec = (NumberRecord) record;
	if (DateUtil.isInternalDateFormat(numrec.getXFIndex())) {
		System.out.println("Cell found with date value " + DateUtil.getJavaDate(numrec.getValue())
				+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
	} else {
		System.out.println("Cell found with value " + numrec.getValue()
				+ " at row " + numrec.getRow() + " and column " + numrec.getColumn());
	}

I wonder if the isADateFormat method would work (https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DateUtil.html#isADateFormat-int-java.lang.String-) but I can't figure out what the formatString would be.

Thoughts? Much appreciated. The documentation is somewhat there but it's "fun" to get thru.

答案1

得分: 0

以下是您要翻译的内容:

如果其他人遇到类似问题,我已经通过此处的TIKA提取器链接解决了这个问题。我建议实现一个HSSFListener,它本身具有FormatTrackingHSSFListener作为成员变量。这将是您的初始侦听器,并将捕获所有格式(如果我理解正确的话),它本身将具有您实现的子侦听器。

public class YourListener implements HSSFListener {

    private SSTRecord sstRecord;
    private FormatTrackingHSSFListener formatListener;
    private List<String> sheetNames = new ArrayList<>();
    private Integer currentSheetIndex = -1;
    private Integer rowCount = 0;

    public YourListener() {
        this.formatListener = new FormatTrackingHSSFListener(this);
    }
}

当您覆盖processRecord时,您的NumberRecord.sid的switch语句可以调用formatListener的formatNumberDateCell。所以类似这样:

case NumberRecord.sid: // 包含数字单元格值
    NumberRecord number = (NumberRecord) record;
    System.out.println(record, formatListener.formatNumberDateCell(number));
    break;
英文:

In case anyone else has issues with this, I've worked through this with the help from the TIKA extractor link in the other question. I would suggest implementing an HSSFListener which itself has a FormatTrackingHSSFListener as a member variable. This will your initial listener and will capture all the formats (if I understand correctly), and it itself will have a child listener of your implementation.

public class YourListener implements HSSFListener {

    private SSTRecord sstRecord;
    private FormatTrackingHSSFListener formatListener;
    private List&lt;String&gt; sheetNames = new ArrayList&lt;&gt;();
    private Integer currentSheetIndex = -1;
    private Integer rowCount = 0;

    public YourListener() {
        this.formatListener = new FormatTrackingHSSFListener(this);
    }
}

When you override processRecord, your switch for the NumberRecord.sid can call the formatListener's formatNumberDateCell. So something like:

case NumberRecord.sid: // Contains a numeric cell value
     NumberRecord number = (NumberRecord) record;
     System.out.println(record, formatListener.formatNumberDateCell(number));
     break;

huangapple
  • 本文由 发表于 2020年8月6日 04:41:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/63273197.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定