Java implements file uploading in parts and resumes uploading at breakpoints

1. Simple multipart upload
Regarding the first question, if the file is too large and is disconnected halfway through the upload, it will be very time-consuming to restart the upload, and you don’t know which part has been uploaded since the last disconnection. Therefore, we should fragment the large files first to prevent the problems mentioned above.

Front-end code:

<!-- html code -->
<!DOCTYPE html>
<html>
<head>
    <title>File upload example</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<form>
    <input type="file" id="fileInput" multiple>
    <button type="button" onclick="upload()" >Upload</button>
</form>
<script>
    function upload() {<!-- -->
        var fileInput = document.getElementById('fileInput');
        var fileName = document.getElementById("fileInput").files[0].name;
        var files = fileInput.files;
        var chunkSize = 1024 * 10; //The size of each chunk is 10KB
        var totalChunks = Math.ceil(files[0].size / chunkSize); //Total number of file chunks
        var currentChunk = 0; //Current number of chunks

        //Upload files in parts
        function uploadChunk() {<!-- -->
            var xhr = new XMLHttpRequest();
            var formData = new FormData();

            //Add the current number of blocks and the total number of blocks to formData
            formData.append('currentChunk', currentChunk);
            formData.append('totalChunks', totalChunks);
            formData.append('fileName',fileName);

            // Calculate the offset and length of the current block in the file
            var start = currentChunk * chunkSize;
            var end = Math.min(files[0].size, start + chunkSize);
            var chunk = files[0].slice(start, end);

            //Add the current block to formData
            formData.append('chunk', chunk);

            //Send fragments to backend
            xhr.open('POST', '/file/upload');
            xhr.send(formData);

            xhr.onload = function() {<!-- -->
                //Update the current block number
                currentChunk + + ;

                // If there are still unuploaded chunks, continue uploading
                if (currentChunk < totalChunks) {<!-- -->
                    uploadChunk();
                } else {<!-- -->
                    // All blocks have been uploaded and files are merged
                    mergeChunks(fileName);
                }
            }
        }

        // Merge all shards
        function mergeChunks() {<!-- -->
            var xhr = new XMLHttpRequest();
            xhr.open("POST", "/file/merge", true);
            xhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
            xhr.onreadystatechange = function() {<!-- -->
                if (xhr.readyState === 4) {<!-- -->
                    if (xhr.status === 200) {<!-- -->
                        console.log("File upload completed:", xhr.responseText);
                    } else {<!-- -->
                        console.error(xhr.responseText);
                    }
                }
            };
            xhr.send("fileName=" + fileName);
        }

        // Start upload
        uploadChunk();
    }
</script>
</body>
</html>

ps: The above code is completed using html + js, and the request uses xhr to send the request. The address of xhr.open is your local interface address. Since testing does not require actual uploading of large files, the size of each fragment is defined as 10KB to simulate large file uploads.

Backend code:

//java code
@RestController
@RequestMapping("/file")
public class FileController {<!-- -->
    @Autowired
    private ResourceLoader resourceLoader;

    @Value("${my.config.savePath}")
    private String uploadPath;

    private Map<String, List<File>> chunksMap = new ConcurrentHashMap<>();

    @PostMapping("/upload")
    public void upload(@RequestParam int currentChunk, @RequestParam int totalChunks,
                       @RequestParam MultipartFile chunk,@RequestParam String fileName) throws IOException {<!-- -->

        // Save the shards to a temporary folder
        String chunkName = chunk.getOriginalFilename() + "." + currentChunk;
        File chunkFile = new File(uploadPath, chunkName);
        chunk.transferTo(chunkFile);

        //Record the status of multipart upload
        List<File> chunkList = chunksMap.get(fileName);
        if (chunkList == null) {<!-- -->
            chunkList = new ArrayList<>(totalChunks);
            chunksMap.put(fileName, chunkList);
        }
        chunkList.add(chunkFile);
    }

    @PostMapping("/merge")
    public String merge(@RequestParam String fileName) throws IOException {<!-- -->

        // Get all fragments and merge them into one file in the order of fragments
        List<File> chunkList = chunksMap.get(fileName);
        if (chunkList == null || chunkList.size() == 0) {<!-- -->
            throw new RuntimeException("Shard does not exist");
        }

        File outputFile = new File(uploadPath, fileName);
        try (FileChannel outChannel = new FileOutputStream(outputFile).getChannel()) {<!-- -->
            for (int i = 0; i < chunkList.size(); i + + ) {<!-- -->
                try (FileChannel inChannel = new FileInputStream(chunkList.get(i)).getChannel()) {<!-- -->
                    inChannel.transferTo(0, inChannel.size(), outChannel);
                }
                chunkList.get(i).delete(); // Delete fragments
            }
        }

        chunksMap.remove(fileName); // Delete records
        // Get the access URL of the file
        Resource resource =
        resourceLoader.getResource("file:" + uploadPath + fileName); //Because it is a local file, it starts with "file". If it is a server, please change it to your own server prefix.
        return resource.getURI().toString();
    }
}

ps: Use a map to record which shards have been uploaded. Here, the shards are stored in a local folder. After all shards are uploaded, they are merged and deleted. Use ConcurrentHashMap instead of HashMap because it is safe under multi-threading.
The above is just a simple file upload code, but as long as you make additional modifications to it, you can solve the problems mentioned above.

2. Solve problems

How to avoid a large number of hard disk reads and writes
One drawback of the above code is that the sharded content is stored in a local folder. And when merging, judging whether the upload is complete also reads the file from the folder. A large number of read and write operations on the disk are not only slow, but also cause the server to crash. Therefore, the following code uses redis to store sharding information to avoid excessive reads and writes to the disk. (You can also use mysql or other middleware to store information. Since reading and writing should not be done in mysql, I used redis).

2. The target file is too large. What should I do if it is disconnected during the upload process?
Use redis to store shard content. After disconnection, the file information is still stored in redis. When the user uploads again, check whether redis has the content of the shard, and skip it if so.

How to find out if the file data uploaded on the front-end page is inconsistent with the original file data
When the front-end calls the upload interface, it first calculates the checksum of the file, and then passes the file and the checksum to the back-end. The back-end calculates the checksum again for the file, and compares the two checksums. If they are equal , it means the data is consistent. If it is inconsistent, an error will be reported and the front end will re-upload the fragment. js calculation checksum code:

// Calculate the SHA-256 checksum of the file
//javascript code
    function calculateHash(fileChunk) {<!-- -->
        return new Promise((resolve, reject) => {<!-- -->
            const blob = new Blob([fileChunk]);
            const reader = new FileReader();
            reader.readAsArrayBuffer(blob);
            reader.onload = () => {<!-- -->
                const arrayBuffer = reader.result;
                const crypto = window.crypto || window.msCrypto;
                const digest = crypto.subtle.digest("SHA-256", arrayBuffer);
                digest.then(hash => {<!-- -->
                    const hashArray = Array.from(new Uint8Array(hash));
                    const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
                    resolve(hashHex);
                });
            };
            reader.onerror = () => {<!-- -->
                reject(new Error('Failed to calculate hash'));
            };
        });
    }

//java code
public static String calculateHash(byte[] fileChunk) throws Exception {<!-- -->
        MessageDigest md = MessageDigest.getInstance("SHA-256");
        md.update(fileChunk);
        byte[] hash = md.digest();
        ByteBuffer byteBuffer = ByteBuffer.wrap(hash);
        StringBuilder hexString = new StringBuilder();
        while (byteBuffer.hasRemaining()) {<!-- -->
            hexString.append(String.format(" x", byteBuffer.get()));
        }
        return hexString.toString();
    }

be careful:

1. The algorithm for calculating the checksum between the front-end and the back-end must be consistent, otherwise the same results will not be obtained.
2. Crypto is used in the front end to calculate files, and relevant js needs to be introduced. You can use script to import or directly download js

<script src="//i2.wp.com/cdn.bootcss.com/crypto-js/3.1.9-1/crypto-js.min.js"></script>

Crypto download address If github cannot be opened, you may need to use npm to download it.

If the upload process is disconnected, how to determine which fragments are not uploaded?

Use redis to detect which fragment’s subscript does not exist. If it does not exist, store it in the list, and finally return the list to the front end.

//java code
boolean allChunksUploaded = true;
List<Integer> missingChunkIndexes = new ArrayList<>();
for (int i = 0; i < hashMap.size(); i + + ) {<!-- -->
if (!hashMap.containsKey(String.valueOf(i))) {<!-- -->
allChunksUploaded = false;
missingChunkIndexes.add(i);
}
}
if (!allChunksUploaded) {<!-- -->
return ResponseEntity.status(HttpStatus.BAD_REQUEST).body(missingChunkIndexes);
}

3. Complete code
1. Introduce dependencies

<dependency>
  <groupId>io.lettuce</groupId>
    <artifactId>lettuce-core</artifactId>
    <version>6.1.4.RELEASE</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

lettuce is a Redis client. You can also use redisTemplat directly without introducing it.

2. Front-end code




    
    File Upload Demo





<script src="//i2.wp.com/cdn.bootcss.com/crypto-js/3.1.9-1/crypto-js.min.js"></script>

3. Backend interface code

@RestController
@RequestMapping("/file2")
public class File2Controller {<!-- -->

    private static final String FILE_UPLOAD_PREFIX = "file_upload:";

    @Autowired
    private ResourceLoader resourceLoader;

    @Value("${my.config.savePath}")
    private String uploadPath;
    @Autowired
    private ThreadLocal<RedisConnection> redisConnectionThreadLocal;
    
// @Autowired
// private RedisTemplate redisTemplate;

    @PostMapping("/upload")
    public ResponseEntity<?> uploadFile(@RequestParam("chunk") MultipartFile chunk,
                                        @RequestParam("chunkIndex") Integer chunkIndex,
                                        @RequestParam("chunkSize") Integer chunkSize,
                                        @RequestParam("chunkChecksum") String chunkChecksum,
                                        @RequestParam("fileId") String fileId) throws Exception {<!-- -->
        if (StringUtils.isBlank(fileId) || StringUtils.isEmpty(fileId)) {<!-- -->
            fileId = UUID.randomUUID().toString();
        }
        String key = FILE_UPLOAD_PREFIX + fileId;
        byte[] chunkBytes = chunk.getBytes();
        String actualChecksum = calculateHash(chunkBytes);
        if (!chunkChecksum.equals(actualChecksum)) {<!-- -->
            return ResponseEntity.status(HttpStatus.BAD_REQUEST).body("Chunk checksum does not match");
        }
// if(!redisTemplate.opsForHash().hasKey(key,String.valueOf(chunkIndex))) {<!-- -->
// redisTemplate.opsForHash().put(key, String.valueOf(chunkIndex), chunkBytes);
// }
        RedisConnection connection = redisConnectionThreadLocal.get();

        Boolean flag = connection.hExists(key.getBytes(), String.valueOf(chunkIndex).getBytes());
        if (flag==null || flag == false) {<!-- -->
            connection.hSet(key.getBytes(), String.valueOf(chunkIndex).getBytes(), chunkBytes);
        }

        return ResponseEntity.ok(fileId);

    }

    public static String calculateHash(byte[] fileChunk) throws Exception {<!-- -->
        MessageDigest md = MessageDigest.getInstance("SHA-256");
        md.update(fileChunk);
        byte[] hash = md.digest();
        ByteBuffer byteBuffer = ByteBuffer.wrap(hash);
        StringBuilder hexString = new StringBuilder();
        while (byteBuffer.hasRemaining()) {<!-- -->
            hexString.append(String.format(" x", byteBuffer.get()));
        }
        return hexString.toString();
    }

    @PostMapping("/merge")
    public ResponseEntity<?> mergeFile(@RequestParam("fileId") String fileId, @RequestParam("fileName") String fileName) throws IOException {<!-- -->
        String key = FILE_UPLOAD_PREFIX + fileId;
        RedisConnection connection = redisConnectionThreadLocal.get();
        try {<!-- -->
            Map<byte[], byte[]> chunkMap = connection.hGetAll(key.getBytes());
// Map chunkMap = redisTemplate.opsForHash().entries(key);
            if (chunkMap.isEmpty()) {<!-- -->
                return ResponseEntity.status(HttpStatus.NOT_FOUND).body("File not found");
            }

            Map<String,byte[]> hashMap = new HashMap<>();
            for(Map.Entry<byte[],byte[]> entry :chunkMap.entrySet()){<!-- -->
                hashMap.put((new String(entry.getKey())),entry.getValue());
            }
            // Check if all fragments have been uploaded
            boolean allChunksUploaded = true;
            List<Integer> missingChunkIndexes = new ArrayList<>();
            for (int i = 0; i < hashMap.size(); i + + ) {<!-- -->
                if (!hashMap.containsKey(String.valueOf(i))) {<!-- -->
                    allChunksUploaded = false;
                    missingChunkIndexes.add(i);
                }
            }
            if (!allChunksUploaded) {<!-- -->
                return ResponseEntity.status(HttpStatus.BAD_REQUEST).body(missingChunkIndexes);
            }

            File outputFile = new File(uploadPath, fileName);
            boolean flag = mergeChunks(hashMap, outputFile);
            Resource resource = resourceLoader.getResource("file:" + uploadPath + fileName);


            if (flag == true) {<!-- -->
                connection.del(key.getBytes());
// redisTemplate.delete(key);
                return ResponseEntity.ok().body(resource.getURI().toString());
            } else {<!-- -->
                return ResponseEntity.status(555).build();
            }
        } catch (Exception e) {<!-- -->
            e.printStackTrace();
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(e.getMessage());
        }
    }

    private boolean mergeChunks(Map<String, byte[]> chunkMap, File destFile) {<!-- -->
        try (FileOutputStream outputStream = new FileOutputStream(destFile)) {<!-- -->
            // Merge the shards in order
            for (int i = 0; i < chunkMap.size(); i + + ) {<!-- -->
                byte[] chunkBytes = chunkMap.get(String.valueOf(i));
                outputStream.write(chunkBytes);
            }
            return true;
        } catch (IOException e) {<!-- -->
            e.printStackTrace();
            return false;
        }
    }
}

4. redis configuration

@Configuration
public class RedisConfig {<!-- -->
    @Value("${spring.redis.host}")
    private String host;

    @Value("${spring.redis.port}")
    private int port;

    @Value("${spring.redis.password}")
    private String password;

    @Bean
    public RedisConnectionFactory redisConnectionFactory() {<!-- -->
        RedisStandaloneConfiguration config = new RedisStandaloneConfiguration();
        config.setHostName(host);
        config.setPort(port);
        config.setPassword(RedisPassword.of(password));
        return new LettuceConnectionFactory(config);
    }
    @Bean
    public ThreadLocal<RedisConnection> redisConnectionThreadLocal(RedisConnectionFactory redisConnectionFactory) {<!-- -->
        return ThreadLocal.withInitial(() -> redisConnectionFactory.getConnection());
    }
}

Using redisConnectionThreadLocal is to avoid establishing connections multiple times, which is very time-consuming.

Summarize
The above is the complete code for this function. When using the code, remember to modify the uploadPath to prevent the code from finding the directory path. At the end of the code, you can use mysql to calculate the checksum for the entire file, store the checksum result, file name, file size, and file type in the database, and determine whether it exists before uploading the next large file. If it exists, do not upload it to avoid taking up space.

Reference article: http://blog.ncmem.com/wordpress/2023/10/12/java implements file uploading in parts and resumes uploading at breakpoints/
Welcome to join the group to discuss