Java – 使用POI过滤Excel文件的行

huangapple go评论76阅读模式
英文:

Java - Filtering rows of an Excel file using POI

问题

我有一个包含许多行的Excel文件(超过60000行),我想对它们应用筛选器,以便只读取我正在寻找的行。

我在Java中使用POI库,但我没有找到如何过滤值的方法。

例如,在我的Excel文件中有以下数据:

名字     | 姓氏     | 年龄
--------+----------+----
约翰     | 杜       |  25
富        | 巴       |  20
阿娇     | 杜       |  22

我怎样才能选择每一行的姓氏等于的行呢?

这是我目前的代码:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // 要筛选的单元格范围
    CellRangeAddress data = new CellRangeAddress(
            1,
            worksheet.getLastRowNum(),
            0,
            worksheet.getRow(0).getPhysicalNumberOfCells());

    worksheet.setAutoFilter(data);
}

我尝试使用了AutoFilter,但我不知道它是如何工作的。

我正在寻找一个类似于这样的功能:

Filter filter = new Filter();
filter.setRange(myRange);
filter.addFilter(
    0, // 列索引
    "杜" // 我要搜索的值
)
filter.apply()

这只是一个假设的代码。

谢谢你的帮助!

英文:

I have an Excel file with many rows (more than 60,000) and I want to apply a filter on them in order to read only the rows that I'm looking for.

I'm using the POI library in Java but I didn't find how to filter values.

For example, with the following data in my Excel file:

First name | Last name | Age
-----------+-----------+----
Jhon       | Doe       |  25
Foo        | Bar       |  20
Aaa        | Doe       |  22

How could I select every row with the last name equal to Doe ?

This is my code so far:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // Cell range to filter
    CellRangeAddress data = new CellRangeAddress(
            1,
            worksheet.getLastRowNum(),
            0,
            worksheet.getRow(0).getPhysicalNumberOfCells());

    worksheet.setAutoFilter(data);
}

I tried to use the AutoFilter but I don't know how it works.

I'm looking for a feature that would look like this:

Filter filter = new Filter();
filter.setRange(myRange);
filter.addFilter(
    0, // The column index
    "Doe" // The value that I'm searching for
)
filter.apply()

This is purely hypothetical code.

Thank you for your help !

答案1

得分: 2

如果您的问题是如何为姓氏设置AutoFilter条件为“Doe”,那么这只能通过使用底层的ooxml-schemas类来实现。XSSFAutoFilter直到现在都是无用的。它到现在为止没有提供任何方法。

使用您的示例数据的完整示例:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.xssf.usermodel.*;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilterColumn;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STFilterOperator;

import java.io.FileOutputStream;

class AutoFilterSetTest {

 private static void setCellData(Sheet sheet) {

  Object[][] data = new Object[][] {
   new Object[] {"First name", "Last name", "Age"},
   new Object[] {"John", "Doe", 25},
   new Object[] {"Foo", "Bar", 20},
   new Object[] {"Jane", "Doe", 22},
   new Object[] {"Ruth", "Moss", 42},
   new Object[] {"Manuel", "Doe", 32},
   new Object[] {"Axel", "Richter", 56},
  };

  Row row = null;
  Cell cell = null;
  int r = 0;
  int c = 0;
  for (Object[] dataRow : data) {
   row = sheet.createRow(r);
   c = 0;
   for (Object dataValue : dataRow) {
    cell = row.createCell(c);
    if (dataValue instanceof String) {
     cell.setCellValue((String)dataValue);
    } else if (dataValue instanceof Number) {
     cell.setCellValue(((Number)dataValue).doubleValue());
    }
    c++;
   }
   r++;
  }
 }

 private static void setCriteriaFilter(XSSFSheet sheet, int colId, int firstRow, int lastRow, String[] criteria) throws Exception {
  CTAutoFilter ctAutoFilter = sheet.getCTWorksheet().getAutoFilter();
  CTFilterColumn ctFilterColumn = null;
  for (CTFilterColumn filterColumn : ctAutoFilter.getFilterColumnList()) {
   if (filterColumn.getColId() == colId) ctFilterColumn = filterColumn;
  }
  if (ctFilterColumn == null) ctFilterColumn = ctAutoFilter.addNewFilterColumn();
  ctFilterColumn.setColId(colId);
  if (ctFilterColumn.isSetFilters()) ctFilterColumn.unsetFilters();

  CTFilters ctFilters = ctFilterColumn.addNewFilters();
  for (int i = 0; i < criteria.length; i++) {
   ctFilters.addNewFilter().setVal(criteria[i]);
  }

  //hiding the rows not matching the criterias
  DataFormatter dataformatter = new DataFormatter();
  for (int r = firstRow; r <= lastRow; r++) {
   XSSFRow row = sheet.getRow(r);
   boolean hidden = true;
   for (int i = 0; i < criteria.length; i++) {
    String cellValue = dataformatter.formatCellValue(row.getCell(colId));
    if (criteria[i].equals(cellValue)) hidden = false;
   }
   if (hidden) {
    row.getCTRow().setHidden(hidden);
   } else {
    if (row.getCTRow().getHidden()) row.getCTRow().unsetHidden();
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XSSFWorkbook wb = new XSSFWorkbook();
  XSSFSheet sheet = wb.createSheet();

  //create rows of data
  setCellData(sheet);

  for (int c = 0; c < 2; c++) sheet.autoSizeColumn(c);

  int lastRow = sheet.getLastRowNum();
  XSSFAutoFilter autofilter = sheet.setAutoFilter(new CellRangeAddress(0, lastRow, 0, 2));
  //XSSFAutoFilter is useless until now

  //set filter criteria 
  setCriteriaFilter(sheet, 1, 1, lastRow, new String[]{"Doe"});

  //get only visible rows after filtering
  XSSFRow row = null;
  for (int r = 1; r <= lastRow; r++) {
   row = sheet.getRow(r);
   if (row.getCTRow().getHidden()) continue;
   for (int c = 0; c < 3; c++) {
    System.out.print(row.getCell(c) + "\t");
   }
   System.out.println();
  }

  FileOutputStream out = new FileOutputStream("AutoFilterSetTest.xlsx");
  wb.write(out);
  out.close();
  wb.close();
 }
}

它会打印出:

John	Doe	  25.0	
Jane	Doe	  22.0	
Manuel	Doe	  32.0	

生成的AutoFilterSetTest.xlsx文件如下图所示:

Java – 使用POI过滤Excel文件的行

英文:

If your question is how to set AutoFilter criteria "Doe" for last name, then this only can be achieved using underlying low level ooxml-schemas classes. The XSSFAutoFilter is useless until now. It does not provide any methods until now.

Complete example using your example data:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.xssf.usermodel.*;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilterColumn;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STFilterOperator;
import java.io.FileOutputStream;
class AutoFilterSetTest {
private static void setCellData(Sheet sheet) {
Object[][] data = new Object[][] {
new Object[] {&quot;First name&quot;, &quot;Last name&quot;, &quot;Age&quot;},
new Object[] {&quot;John&quot;, &quot;Doe&quot;, 25},
new Object[] {&quot;Foo&quot;, &quot;Bar&quot;, 20},
new Object[] {&quot;Jane&quot;, &quot;Doe&quot;, 22},
new Object[] {&quot;Ruth&quot;, &quot;Moss&quot;, 42},
new Object[] {&quot;Manuel&quot;, &quot;Doe&quot;, 32},
new Object[] {&quot;Axel&quot;, &quot;Richter&quot;, 56},
};
Row row = null;
Cell cell = null;
int r = 0;
int c = 0;
for (Object[] dataRow : data) {
row = sheet.createRow(r);
c = 0;
for (Object dataValue : dataRow) {
cell = row.createCell(c);
if (dataValue instanceof String) {
cell.setCellValue((String)dataValue);
} else if (dataValue instanceof Number) {
cell.setCellValue(((Number)dataValue).doubleValue());
}
c++;
}
r++;
}
}
private static void setCriteriaFilter(XSSFSheet sheet, int colId, int firstRow, int lastRow, String[] criteria) throws Exception {
CTAutoFilter ctAutoFilter = sheet.getCTWorksheet().getAutoFilter();
CTFilterColumn ctFilterColumn = null;
for (CTFilterColumn filterColumn : ctAutoFilter.getFilterColumnList()) {
if (filterColumn.getColId() == colId) ctFilterColumn = filterColumn;
}
if (ctFilterColumn == null) ctFilterColumn = ctAutoFilter.addNewFilterColumn();
ctFilterColumn.setColId(colId);
if (ctFilterColumn.isSetFilters()) ctFilterColumn.unsetFilters();
CTFilters ctFilters = ctFilterColumn.addNewFilters();
for (int i = 0; i &lt; criteria.length; i++) {
ctFilters.addNewFilter().setVal(criteria[i]);
}
//hiding the rows not matching the criterias
DataFormatter dataformatter = new DataFormatter();
for (int r = firstRow; r &lt;= lastRow; r++) {
XSSFRow row = sheet.getRow(r);
boolean hidden = true;
for (int i = 0; i &lt; criteria.length; i++) {
String cellValue = dataformatter.formatCellValue(row.getCell(colId));
if (criteria[i].equals(cellValue)) hidden = false;
}
if (hidden) {
row.getCTRow().setHidden(hidden);
} else {
if (row.getCTRow().getHidden()) row.getCTRow().unsetHidden();
}
}
}
public static void main(String[] args) throws Exception {
XSSFWorkbook wb = new XSSFWorkbook();
XSSFSheet sheet = wb.createSheet();
//create rows of data
setCellData(sheet);
for (int c = 0; c &lt; 2; c++) sheet.autoSizeColumn(c);
int lastRow = sheet.getLastRowNum();
XSSFAutoFilter autofilter = sheet.setAutoFilter(new CellRangeAddress(0, lastRow, 0, 2));
//XSSFAutoFilter is useless until now
//set filter criteria 
setCriteriaFilter(sheet, 1, 1, lastRow, new String[]{&quot;Doe&quot;});
//get only visible rows after filtering
XSSFRow row = null;
for (int r = 1; r &lt;= lastRow; r++) {
row = sheet.getRow(r);
if (row.getCTRow().getHidden()) continue;
for (int c = 0; c &lt; 3; c++) {
System.out.print(row.getCell(c) + &quot;\t&quot;);
}
System.out.println();
}
FileOutputStream out = new FileOutputStream(&quot;AutoFilterSetTest.xlsx&quot;);
wb.write(out);
out.close();
wb.close();
}
}

It prints:

John	Doe	  25.0	
Jane	Doe	  22.0	
Manuel	Doe	  32.0	

Resulting AutoFilterSetTest.xlsx looks like:

Java – 使用POI过滤Excel文件的行

答案2

得分: 0

import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.util.ArrayList;
import java.util.List;

public class ExcelWorksheetFilter {

    private List<FilterRule> ruleList = new ArrayList<>();
    private CellRangeAddress cellRange;
    private XSSFSheet worksheet;
    private XSSFWorkbook workbook;

    public ExcelWorksheetFilter(XSSFWorkbook workbook, int worksheetId) {
        this.workbook = workbook;
        this.worksheet = workbook.getSheetAt(worksheetId);
    }

    public void apply(){
        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(true);
            for(FilterRule rule : ruleList){
                if(rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(false);
                    break;
                }
            }
        }
    }

    public void applyStrict(){
        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(false);
            for(FilterRule rule : ruleList){
                if(!rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(true);
                    break;
                }
            }
        }
    }

    public List<Row> getRowList(){
        List<Row> rowList = new ArrayList<>();

        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            if(!worksheet.getRow(rowId).getCTRow().getHidden()){
                rowList.add(worksheet.getRow(rowId));
            }
        }

        return rowList;
    }

    public void addRule(FilterRule rule) {
        this.ruleList.add(rule);
    }

    // Getters and setters omitted...
}

import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.usermodel.XSSFRow;

public class FilterRule {

    private final static DataFormatter df = new DataFormatter();

    private Integer columnId;
    private String[] values;
    private FilterRuleOperation operator;

    public FilterRule(Integer columnId, FilterRuleOperation operator, String[] values){
        this.columnId = columnId;
        this.operator = operator;
        this.values = values;
    }

    public boolean match(XSSFRow row){
        for(String value : values){
            if(operator.match(df.formatCellValue(row.getCell(columnId)), value)){
                return true;
            };
        }
        return false;
    }
}

public enum FilterRuleOperation {

    DIFFERENT("!="){
        @Override public boolean match(String x, String y){
            return !x.equals(y);
        }
    },
    EQUAL("=="){
        @Override public boolean match(String x, String y){
            return x.equals(y);
        }
    };

    private final String text;

    private FilterRuleOperation(String text) {
        this.text = text;
    }

    public abstract boolean match(String x, String y);

    @Override public String toString() {
        return text;
    }
}

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    ExcelWorksheetFilter excelWorksheetFilter = new ExcelWorksheetFilter(myExcelFile, 0);
    excelWorksheetFilter.setCellRange(new CellRangeAddress(
        1,
        worksheet.getLastRowNum(),
        0,
        worksheet.getRow(0).getPhysicalNumberOfCells()-1
    ));

    excelWorksheetFilter.addRule(new FilterRule(
            1,
            FilterRuleOperation.EQUAL,
            new String[]{"Doe"}
            ));

    excelWorksheetFilter.addRule(new FilterRule(
            0,
            FilterRuleOperation.EQUAL,
            new String[]{"Jhon"}
    ));

    excelWorksheetFilter.applyStrict();

    excelWorksheetFilter.getRowList().forEach(row -> {
        for(int i = 0; i < 3; i++) {
            System.out.print(df.formatCellValue(row.getCell(i)) + '\t');
        }
        System.out.println();
    });

    FileOutputStream out = new FileOutputStream("filter_test.xlsx");
    excelWorksheetFilter.getWorkbook().write(out);
    out.close();
    excelWorksheetFilter.getWorkbook().close();
}
英文:

Maybe this could help other people, so this the solution I came up with before this answer.
Take in consideration that I am not very good with Java so the following code can surely be optimized.

I implemented a filter by myself, to do so, I created 3 classes :

  • ExcelWorksheetFilter
  • FilterRule
  • FilterRuleOperation

ExcelWorksheetFilter

import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.util.ArrayList;
import java.util.List;

public class ExcelWorksheetFilter {

    private List&lt;FilterRule&gt; ruleList = new ArrayList&lt;&gt;();
    private CellRangeAddress cellRange;
    private XSSFSheet worksheet;
    private XSSFWorkbook workbook;

    public ExcelWorksheetFilter(XSSFWorkbook workbook, int worksheetId) {
        this.workbook = workbook;
        this.worksheet = workbook.getSheetAt(worksheetId);
    }

    /**
     * Apply rules of ruleList to the worksheet.
     * The row is put in the result if at least one rule match.
     */
    public void apply(){

        for(int rowId = cellRange.getFirstRow(); rowId &lt;= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(true);
            for(FilterRule rule : ruleList){
                if(rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(false);
                    break;
                }
            }
        }
    }

    /**
     * Apply rules of ruleList to the worksheet.
     * The row is put in the result if every rules match.
     */
    public void applyStrict(){
        for(int rowId = cellRange.getFirstRow(); rowId &lt;= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(false);
            for(FilterRule rule : ruleList){
                if(!rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(true);
                    break;
                }
            }
        }
    }

    public List&lt;Row&gt; getRowList(){
        List&lt;Row&gt; rowList = new ArrayList&lt;&gt;();

        for(int rowId = cellRange.getFirstRow(); rowId &lt;= cellRange.getLastRow(); rowId++){
            if(!worksheet.getRow(rowId).getCTRow().getHidden()){
                rowList.add(worksheet.getRow(rowId));
            }
        }

        return rowList;
    }

    public void addRule(FilterRule rule) {
        this.ruleList.add(rule);
    }

    // Getters and setters omitted...
}

FilterRule

import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.usermodel.XSSFRow;

public class FilterRule {

    private final static DataFormatter df = new DataFormatter();

    private Integer columnId;
    private String[] values;
    private FilterRuleOperation operator;

    public FilterRule(Integer columnId, FilterRuleOperation operator, String[] values){
        this.columnId = columnId;
        this.operator = operator;
        this.values = values;
    }

    /**
     * If at least one of the value matches return true.
     * @param row The row to match
     * @return a boolean
     */
    public boolean match(XSSFRow row){
        for(String value : values){
            if(operator.match(df.formatCellValue(row.getCell(columnId)), value)){
                return true;
            };
        }
        return false;
    }
}

FilterRuleOperation

public enum FilterRuleOperation {

    DIFFERENT(&quot;!=&quot;){
        @Override public boolean match(String x, String y){
            return !x.equals(y);
        }
    },
    EQUAL(&quot;==&quot;){
        @Override public boolean match(String x, String y){
            return x.equals(y);
        }
    };

    private final String text;

    private FilterRuleOperation(String text) {
        this.text = text;
    }

    public abstract boolean match(String x, String y);

    @Override public String toString() {
        return text;
    }
}

Then you can use it almost like described the OP.
For example with this Excel file:
Java – 使用POI过滤Excel文件的行

And this code:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // Create the filter
    ExcelWorksheetFilter excelWorksheetFilter = new ExcelWorksheetFilter(myExcelFile, 0);
    excelWorksheetFilter.setCellRange(new CellRangeAddress(
        1, // Exclude the row with columns titles
        worksheet.getLastRowNum(),
        0,
        worksheet.getRow(0).getPhysicalNumberOfCells()-1
    ));

    // Create rules for filtering
    excelWorksheetFilter.addRule(new FilterRule(
            1, // Last name column
            FilterRuleOperation.EQUAL,
            new String[]{&quot;Doe&quot;}
            ));

    excelWorksheetFilter.addRule(new FilterRule(
            0, // First name column
            FilterRuleOperation.EQUAL,
            new String[]{&quot;Jhon&quot;}
    ));

    // Apply with applyStrict function puts a AND condition between rules
    excelWorksheetFilter.applyStrict();
    // You can also use apply function it puts a OR condition between rules
    // excelWorksheetFilter.apply();
    
    excelWorksheetFilter.getRowList().forEach(row -&gt; {
        for(int i = 0; i &lt;3; i++) {
            System.out.print(df.formatCellValue(row.getCell(i)) + &#39;\t&#39;);
        }
        System.out.println();
    });

    // Save the file
    FileOutputStream out = new FileOutputStream(&quot;filter_test.xlsx&quot;);
    excelWorksheetFilter.getWorkbook().write(out);
    out.close();
    excelWorksheetFilter.getWorkbook().close();
}

This will print:

Jhon	Doe	25

Java – 使用POI过滤Excel文件的行

And if you use excelWorksheetFilter.apply() it will print:

Jhon	Doe	   25	
Aaa	    Doe	   22	
Jhon	Smith  30

Java – 使用POI过滤Excel文件的行

The two main downside are:

  • It doesn't use the Excel filter so it is harder to use the Excel file afterward.
  • Not memory efficient as the ExcelWorksheetFilter.getRowList() function return a list and not an Iterator.

Also it only works with strings but I suppose that it could be adapted to work with other type of data.

huangapple
  • 本文由 发表于 2020年9月2日 18:23:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63703540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定