2016-11-30 342 views
1

我需要解析非常大的使用apache poi和极限内存的excel文件。 Google搜索后,我开始知道poi将SAX解析器有效地提供给解析器大文件,而不消耗大量内存。Apache POI SAX解析 - 如何获取单元格的实际值

Apache POI SAX Parser example

private class SheetToCSV implements SheetContentsHandler { 
    private boolean firstCellOfRow = false; 
    private int currentRow = -1; 
    private int currentCol = -1; 

    private void outputMissingRows(int number) { 
     for (int i=0; i<number; i++) { 
      for (int j=0; j<minColumns; j++) { 
       output.append(','); 
      } 
      output.append('\n'); 
     } 
    } 

    @Override 
    public void startRow(int rowNum) { 
     // If there were gaps, output the missing rows 
     outputMissingRows(rowNum-currentRow-1); 
     // Prepare for this row 
     firstCellOfRow = true; 
     currentRow = rowNum; 
     currentCol = -1; 
    } 

    @Override 
    public void endRow(int rowNum) { 
     // Ensure the minimum number of columns 
     for (int i=currentCol; i<minColumns; i++) { 
      output.append(','); 
     } 
     output.append('\n'); 
    } 

    @Override 
    public void cell(String cellReference, String formattedValue, 
      XSSFComment comment) { 
     if (firstCellOfRow) { 
      firstCellOfRow = false; 
     } else { 
      output.append(','); 
     } 

     // gracefully handle missing CellRef here in a similar way as XSSFCell does 
     if(cellReference == null) { 
      cellReference = new CellAddress(currentRow, currentCol).formatAsString(); 
     } 

     // Did we miss any cells? 
     int thisCol = (new CellReference(cellReference)).getCol(); 
     int missedCols = thisCol - currentCol - 1; 
     for (int i=0; i<missedCols; i++) { 
      output.append(','); 
     } 
     currentCol = thisCol; 

     // Number or string? 
     try { 
      Double.parseDouble(formattedValue); 
      output.append(formattedValue); 
     } catch (NumberFormatException e) { 
      output.append('"'); 
      output.append(formattedValue); 
      output.append('"'); 
     } 
    } 

    @Override 
    public void headerFooter(String text, boolean isHeader, String tagName) { 
     // Skip, no headers or footers in CSV 
    } 
} 

在上述链接所提供的示例中,该方法“小区”仅必须格式化值访问但是我需要访问单元的实际值。

+0

写你自己的SAX处理程序传入? – Gagravarr

回答

2

流接口的当前实现不提供此。因此,为了达到这个目的,您需要复制底层XSSFSheetXMLHandler的代码并对其进行调整,以避免格式化单元格内容。

+0

非常感谢@centic – Arul

相关问题