How to efficiently export millions of data volumes to Excel?

Article directory

    • How to efficiently export millions of data volumes to Excel?
  • Traditional way:
      • Code:
  • other methods:
    • 1. Apache POI-based approach.
    • 2. Based on EasyExcel.
    • Summarize:
  • epilogue

How to efficiently export millions of data to Excel?

Traditional way:

  1. Use a stream-based approach for data writing, avoiding loading all data into memory at once. You can use the SXSSFWorkbook class in the Apache POI library, which uses a stream-based method for data writing, which can avoid memory overflow problems and improve writing efficiency.

  2. Write data in segments, for example, only write 1000 pieces of data at a time, to avoid writing all data at once. In this way, the problem of low writing efficiency can be avoided, the memory occupation can be reduced, and the writing efficiency can be improved.

  3. Write data in columns, not rows. This can reduce the number of times the cursor needs to be moved when writing data, and improve writing efficiency.

  4. Avoid frequent creation of objects and variables. In the case of a large amount of data, creating objects and variables will take up more memory, thereby reducing the efficiency of the program. You can use technologies such as object pool and thread pool to optimize the efficiency of the program.

  5. Use multiple threads for data writing. After the data is segmented, different threads can write the data separately, thereby improving the writing efficiency. Multi-thread management and coordination can be achieved using Java’s Executor framework.

Code implementation:

To write data in a stream-based way, you can use the SXSSFWorkbook class in the Apache POI library, which uses a stream-based way to write data to avoid loading all data into memory at one time. Here is a simple code example, assuming the data needs to be exported to an Excel file called “test.xlsx”:

// Create workbook object
SXSSFWorkbook workbook = new SXSSFWorkbook();

// Create worksheet object
SXSSFSheet sheet = workbook. createSheet("Sheet1");

// write header
Row header = sheet. createRow(0);
header.createCell(0).setCellValue("Name");
header.createCell(1).setCellValue("age");

// data input
for (int i = 0; i < 1000000; i ++ ) {<!-- -->
    Row row = sheet. createRow(i + 1);
    row.createCell(0).setCellValue("Zhang San" + i);
    row.createCell(1).setCellValue(i);
}

// write to file
FileOutputStream outputStream = new FileOutputStream("test.xlsx");
workbook.write(outputStream);
outputStream. close();

// close the workbook object
workbook. close();

Other ways:

In Java, the commonly used ways to write Excel files are as follows:

1. Apache POI-based approach.

Apache POI is an open source Java library that provides an API for reading and writing Excel files, supporting multiple file formats, including XLS and XLSX formats. For millions of data volumes, you can use the SXSSFWorkbook class for stream-based writing to avoid memory overflow problems. Here is a sample code using Apache POI:

 public static void writeExcel(List<List<Object>> data, String filePath, String sheetName) {<!-- -->
    Workbook workbook = new SXSSFWorkbook();
    Sheet sheet = workbook. createSheet(sheetName);

    int rownum = 0;
    for (List<Object> rowData : data) {<!-- -->
        Row row = sheet.createRow(rownum + + );
        int cellnum = 0;
        for (Object obj : rowData) {<!-- -->
            Cell cell = row.createCell(cellnum + + );
            if (obj instanceof String) {<!-- -->
                cell.setCellValue((String)obj);
            } else if (obj instanceof Integer) {<!-- -->
                cell.setCellValue((Integer)obj);
            } else if (obj instanceof Double) {<!-- -->
                cell.setCellValue((Double) obj);
            } // Other types can be added as needed
        }
    }

    try {<!-- -->
        FileOutputStream out = new FileOutputStream(new File(filePath));
        workbook. write(out);
        out. close();
        workbook. close();
    } catch (IOException e) {<!-- -->
        e.printStackTrace();
    }
}

2. Based on EasyExcel method.

EasyExcel is a Java library based on Apache POI package, which provides a simpler and easier-to-use API, and supports reading and writing multiple file formats, including XLS, XLSX and CSV formats. For large-scale data volume, EasyExcel adopts a flow-based method for data reading and writing, which can avoid the problem of memory overflow, and provides features such as multi-threading and asynchronous writing, which can further improve the efficiency of the program. Here is a sample code using EasyExcel:

 public static void writeExcel(List<List<Object>> data, String filePath, String sheetName) {<!-- -->
    ExcelWriter excelWriter = EasyExcel.write(filePath).build();
    WriteSheet writeSheet = EasyExcel.writerSheet(sheetName).build();

    int rownum = 0;
    for (List<Object> rowData : data) {<!-- -->
        List<String> rowValues = rowData. stream(). map(Object::toString). collect(Collectors. toList());
        excelWriter.write(rowValues, writeSheet);
    }

    excelWriter. finish();
}

In short, for large-scale data writing, it is recommended to use a stream-based method for data reading and writing, and use techniques such as segmentation, columns, object pools, and thread pools to improve program efficiency. Both Apache POI and EasyExcel are commonly used Java libraries, and you can choose the appropriate library for development according to your specific needs.

Summary:

In addition to using the stream-based method for data reading and writing, there are some other implementations that can further improve the efficiency of the program, including:

Use memory-mapped files. Memory-mapped files are a technology that maps file contents to memory, which can effectively reduce the number of reads and writes of files and improve program efficiency. In Java, you can use the MappedByteBuffer class of the NIO library to read and write memory-mapped files.

Use a caching mechanism. In the process of writing data to an Excel file, a caching mechanism can be used to avoid frequent reading and writing of disk files. The data can be divided into multiple blocks according to certain rules, and the data of these blocks can be cached in memory. When the cache reaches a certain size, the data is written to the Excel file at one time, which can avoid frequent I/O operations.

Use multithreading or asynchronously. For large-scale data writing, multi-threading or asynchronous methods can be used to improve the efficiency of the program. Data can be divided into chunks and processed in parallel using thread pools or asynchronous tasks. During the processing, it is necessary to pay attention to the issues of thread safety and data consistency.

The above are some commonly used technologies and implementation methods to improve program efficiency. Which method to use needs to be evaluated and selected according to the actual situation.

If you still want to know how to import the database, please refer to:

Is the efficiency of importing millions of Excel databases too low? The import of SAX-based event model will solve the efficiency problem

Conclusion

If this article is helpful or inspiring to you, please click three times: Like, comment, bookmark? Follow, your support is the biggest motivation for me to keep writing.