10G large files, seconds transfer, breakpoint resume transfer, multi-part upload

Uploading very large files is a commonplace topic. When the file is relatively small, the file can be directly converted into a byte stream and uploaded to the server. However, when the file is relatively large, it can be uploaded in the normal way. This can This is not a good idea. After all, few people will endure the unpleasant experience of having to restart uploading when the file upload is interrupted in the middle.

So is there a better uploading experience? The answer is yes, there are several uploading methods that will be introduced below.

1. What is instant transmission?
In layman’s terms, when you upload something to be uploaded, the server will first do MD5 verification. If there is the same thing on the server, it will directly give you a new address. In fact, what you download is the same file on the server. If you want to transfer it without seconds, in fact, as long as you change the MD5, you just need to modify the file itself (changing the name will not work). For example, if you add a few more words to a text file, the MD5 will change and it will not be transferred instantly.

2. The core logic of instant transmission implemented in this article
a. Use the set method of redis to store the file upload status, where the key is the md5 of the file upload, and the value is the flag of whether the upload is completed.

b. When the flag bit is true, the upload has been completed. If the same file is uploaded at this time, the second upload logic will be entered. If the flag is false, it means that the upload has not been completed. At this time, you need to call the set method to save the path of the block number file record, where the key is the uploaded file md5 plus a fixed prefix, and the value is the block number file record path.

Multipart upload
1. What is multipart upload?
Multipart upload means dividing the file to be uploaded into multiple data blocks (we call them Parts) according to a certain size and uploading them separately. After the upload is complete, the server will perform the uploading on all uploaded files. Summarize and integrate into original documents.

2. Scenario of multi-part upload
1. Large file upload

2. The network environment is not good and there is a risk of retransmission.

http
1. What is resume transfer?
Breakpoint resume downloading is to artificially divide the download or upload task (a file or a compressed package) into several parts when downloading or uploading. Each part uses a thread to upload or download. If you encounter a network failure, you can Start uploading or downloading the unfinished part from the already uploaded or downloaded part. There is no need to start uploading or downloading from the beginning. The breakpoint resume upload in this article is mainly for breakpoint upload scenarios.

2. Application scenarios
Resumeable upload can be regarded as a derivative of multipart upload, so it can be used in all scenarios where multipart upload can be used.

3. Implement the core logic of breakpoint resume downloading
During the multipart upload process, if the upload is interrupted due to abnormal factors such as system crash or network interruption, the client needs to record the progress of the upload. When uploading is supported again in the future, you can continue uploading from where the last upload was interrupted.

In order to avoid the problem that the client’s progress data after uploading is deleted and the upload is restarted from the beginning, the server can also provide a corresponding interface to facilitate the client to query the uploaded fragmented data, so that the client knows what has been uploaded. Split the data to continue uploading starting from the next sharded data.

4. Implement process steps
a. Option 1, conventional steps

Divide the files that need to be uploaded into data blocks of the same size according to certain splitting rules;

Initialize a multipart upload task and return the unique identifier of this multipart upload;

Send each fragmented data block according to a certain strategy (serial or parallel);

After the transmission is completed, the server determines whether the uploaded data is complete. If it is complete, it synthesizes the data blocks to obtain the original file.

b. Solution 2, steps to implement this article

The front-end (client) needs to fragment the file according to a fixed size, and the fragment serial number and size must be brought when requesting the back-end (server).
The server creates a conf file to record the location of the chunks. The length of the conf file is the total number of fragments. Each time a chunk is uploaded, a 127 is written to the conf file. Then the location that has not been uploaded is the default 0, and the location that has been uploaded is 0. Byte.MAX_VALUE 127 (This step is the core step to achieve breakpoint resuming and seconds transmission)
The server calculates the starting position based on the fragment sequence number given in the request data and the size of each fragment (the fragment size is fixed and the same), and writes the file fragment data with the read file fragment data.
5. Partial upload/breakpoint upload code implementation
a. The front end uses the webuploader plug-in provided by Baidu for sharding.

b. The backend uses two methods to write files, one is to use RandomAccessFile

c. The other is to use MappedByteBuffer

a. RandomAccessFile implementation

@UploadMode(mode = UploadModeEnum.RANDOM_ACCESS)
@Slf4j
public class RandomAccessUploadStrategy extends SliceUploadTemplate {<!-- -->
  
  @Autowired
  private FilePathUtil filePathUtil;
  
  @Value("${upload.chunkSize}")
  private long defaultChunkSize;
  
  @Override
  public boolean upload(FileUploadRequestDTO param) {<!-- -->
    RandomAccessFile accessTmpFile = null;
    try {<!-- -->
      String uploadDirPath = filePathUtil.getPath(param);
      File tmpFile = super.createTmpFile(param);
      accessTmpFile = new RandomAccessFile(tmpFile, "rw");
      //This must be consistent with the value set on the front end
      long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024
          : param.getChunkSize();
      long offset = chunkSize * param.getChunk();
      //Locate the offset of the fragment
      accessTmpFile.seek(offset);
      //Write the fragment data
      accessTmpFile.write(param.getFile().getBytes());
      boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath);
      return isOk;
    } catch (IOException e) {<!-- -->
      log.error(e.getMessage(), e);
    } finally {<!-- -->
      FileUtil.close(accessTmpFile);
    }
   return false;
  }
  
}

b. Implementation of MappedByteBuffer

@UploadMode(mode = UploadModeEnum.MAPPED_BYTEBUFFER)
@Slf4j
public class MappedByteBufferUploadStrategy extends SliceUploadTemplate {<!-- -->
  
  @Autowired
  private FilePathUtil filePathUtil;
  
  @Value("${upload.chunkSize}")
  private long defaultChunkSize;
  
  @Override
  public boolean upload(FileUploadRequestDTO param) {<!-- -->
  
    RandomAccessFile tempRaf = null;
    FileChannel fileChannel = null;
    MappedByteBuffer mappedByteBuffer = null;
    try {<!-- -->
      String uploadDirPath = filePathUtil.getPath(param);
      File tmpFile = super.createTmpFile(param);
      tempRaf = new RandomAccessFile(tmpFile, "rw");
      fileChannel = tempRaf.getChannel();
  
      long chunkSize = Objects.isNull(param.getChunkSize()) ? defaultChunkSize * 1024 * 1024
          : param.getChunkSize();
      //Write the fragment data
      long offset = chunkSize * param.getChunk();
      byte[] fileData = param.getFile().getBytes();
      mappedByteBuffer = fileChannel
.map(FileChannel.MapMode.READ_WRITE, offset, fileData.length);
      mappedByteBuffer.put(fileData);
      boolean isOk = super.checkAndSetUploadProgress(param, uploadDirPath);
      return isOk;
  
    } catch (IOException e) {<!-- -->
      log.error(e.getMessage(), e);
    } finally {<!-- -->
  
      FileUtil.freedMappedByteBuffer(mappedByteBuffer);
      FileUtil.close(fileChannel);
      FileUtil.close(tempRaf);
  
    }
  
    return false;
  }
  
}

c. File operation core template class code

@Slf4j
public abstract class SliceUploadTemplate implements SliceUploadStrategy {<!-- -->
  
  public abstract boolean upload(FileUploadRequestDTO param);
  
  protected File createTmpFile(FileUploadRequestDTO param) {<!-- -->
  
    FilePathUtil filePathUtil = SpringContextHolder.getBean(FilePathUtil.class);
    param.setPath(FileUtil.withoutHeadAndTailDiagonal(param.getPath()));
    String fileName = param.getFile().getOriginalFilename();
    String uploadDirPath = filePathUtil.getPath(param);
    String tempFileName = fileName + "_tmp";
    File tmpDir = new File(uploadDirPath);
    File tmpFile = new File(uploadDirPath, tempFileName);
    if (!tmpDir.exists()) {<!-- -->
      tmpDir.mkdirs();
    }
    return tmpFile;
  }
  
  @Override
  public FileUploadDTO sliceUpload(FileUploadRequestDTO param) {<!-- -->
  
    boolean isOk = this.upload(param);
    if (isOk) {<!-- -->
      File tmpFile = this.createTmpFile(param);
      FileUploadDTO fileUploadDTO = this.saveAndFileUploadDTO(param.getFile().getOriginalFilename(), tmpFile);
      return fileUploadDTO;
    }
    String md5 = FileMD5Util.getFileMD5(param.getFile());
  
    Map<Integer, String> map = new HashMap<>();
    map.put(param.getChunk(), md5);
    return FileUploadDTO.builder().chunkMd5Info(map).build();
  }
  
  /**
   * Check and modify file upload progress
   */
  public boolean checkAndSetUploadProgress(FileUploadRequestDTO param, String uploadDirPath) {<!-- -->
  
    String fileName = param.getFile().getOriginalFilename();
    File confFile = new File(uploadDirPath, fileName + ".conf");
    byte isComplete = 0;
    RandomAccessFile accessConfFile = null;
    try {<!-- -->
      accessConfFile = new RandomAccessFile(confFile, "rw");
      //Mark this segment as true to indicate completion
      System.out.println("set part " + param.getChunk() + " complete");
      //Create a conf file. The file length is the total number of fragments. Each time a fragment is uploaded, a 127 is written to the conf file. Then the position that has not been uploaded is the default 0, and the uploaded one is Byte.MAX_VALUE 127.
      accessConfFile.setLength(param.getChunks());
      accessConfFile.seek(param.getChunk());
      accessConfFile.write(Byte.MAX_VALUE);
  
      //completeList checks whether all are completed, if all the arrays are 127 (all slices are uploaded successfully)
      byte[] completeList = FileUtils.readFileToByteArray(confFile);
      isComplete = Byte.MAX_VALUE;
      for (int i = 0; i < completeList.length & amp; & amp; isComplete == Byte.MAX_VALUE; i + + ) {<!-- -->
        //And operation, if some parts are not completed, isComplete is not Byte.MAX_VALUE
        isComplete = (byte) (isComplete & completeList[i]);
        System.out.println("check part " + i + " complete?:" + completeList[i]);
      }
  
    } catch (IOException e) {<!-- -->
      log.error(e.getMessage(), e);
    } finally {<!-- -->
      FileUtil.close(accessConfFile);
    }
 boolean isOk = setUploadProgress2Redis(param, uploadDirPath, fileName, confFile, isComplete);
    return isOk;
  }
  
  /**
   * Save upload progress information into redis
   */
  private boolean setUploadProgress2Redis(FileUploadRequestDTO param, String uploadDirPath,
      String fileName, File confFile, byte isComplete) {<!-- -->
  
    RedisUtil redisUtil = SpringContextHolder.getBean(RedisUtil.class);
    if (isComplete == Byte.MAX_VALUE) {<!-- -->
      redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "true");
      redisUtil.del(FileConstant.FILE_MD5_KEY + param.getMd5());
      confFile.delete();
      return true;
    } else {<!-- -->
      if (!redisUtil.hHasKey(FileConstant.FILE_UPLOAD_STATUS, param.getMd5())) {<!-- -->
        redisUtil.hset(FileConstant.FILE_UPLOAD_STATUS, param.getMd5(), "false");
        redisUtil.set(FileConstant.FILE_MD5_KEY + param.getMd5(),
            uploadDirPath + FileConstant.FILE_SEPARATORCHAR + fileName + ".conf");
      }
  
      return false;
    }
  }
/**
   * Save file operation
   */
  public FileUploadDTO saveAndFileUploadDTO(String fileName, File tmpFile) {<!-- -->
  
    FileUploadDTO fileUploadDTO = null;
  
    try {<!-- -->
  
      fileUploadDTO = renameFile(tmpFile, fileName);
      if (fileUploadDTO.isUploadComplete()) {<!-- -->
        System.out
            .println("upload complete !!" + fileUploadDTO.isUploadComplete() + " name=" + fileName);
        //TODO save file information to the database
  
      }
  
    } catch (Exception e) {<!-- -->
      log.error(e.getMessage(), e);
    } finally {<!-- -->
  
    }
    return fileUploadDTO;
  }
/**
   * File rename
   *
   * @param toBeRenamed The file whose name will be modified
   * @param toFileNewName new name
   */
  private FileUploadDTO renameFile(File toBeRenamed, String toFileNewName) {<!-- -->
    //Check whether the file to be renamed exists and is a file
    FileUploadDTO fileUploadDTO = new FileUploadDTO();
    if (!toBeRenamed.exists() || toBeRenamed.isDirectory()) {<!-- -->
      log.info("File does not exist: {}", toBeRenamed.getName());
      fileUploadDTO.setUploadComplete(false);
      return fileUploadDTO;
    }
    String ext = FileUtil.getExtension(toFileNewName);
    String p = toBeRenamed.getParent();
    String filePath = p + FileConstant.FILE_SEPARATORCHAR + toFileNewName;
    File newFile = new File(filePath);
    //Modify file name
    boolean uploadFlag = toBeRenamed.renameTo(newFile);
  
    fileUploadDTO.setMtime(DateUtil.getCurrentTimeStamp());
    fileUploadDTO.setUploadComplete(uploadFlag);
    fileUploadDTO.setPath(filePath);
    fileUploadDTO.setSize(newFile.length());
    fileUploadDTO.setFileExt(ext);
    fileUploadDTO.setFileId(toFileNewName);
  
    return fileUploadDTO;
  }
}

Summarize
In the process of implementing multipart upload, the front-end and back-end need to cooperate. For example, the file size of the upload block number of the front-end and the front-end must be consistent, otherwise there will be problems with the upload. Secondly, normal file-related operations require setting up a file server, such as using fastdfs, hdfs, etc.

In this sample code, when the computer is configured with 4 cores and 8G of memory, it takes more than 30 minutes to upload a 24G file. The main time is spent on the front-end md5 value calculation, and the back-end writing speed is still relatively fast. If the project team feels that building a self-built file server is too time-consuming, and the project’s requirements are only for uploading and downloading, then it is recommended to use Alibaba’s oss server, whose introduction can be viewed on the official website:

https://help.aliyun.com/product/31815.html
Alibaba’s oss is essentially an object storage server, not a file server. Therefore, if there is a need to delete or modify a large number of files, oss may not be a good choice.

Reference article: http://blog.ncmem.com/wordpress/2023/10/28/10g-Large files, second transfer, breakpoint resume, multi-part upload/
Welcome to join the group to discuss