Various languages [Python, Java, Go, Pascal, C++] directly read the Chinese text lines of the compressed package [rar, zip, 7z, gz, tar, z…] without decompression.

Article directory

  • (Zero) Preface
  • (1) [ZIP] format
    • (1.1) Python
    • (1.2)Java
    • (1.3)Golang
    • (1.4)Pascal
      • (1.4.1) Lazarus (Free Pascal)
      • (1.4.2)Delphi
    • (1.5)C++
  • (2) [GZIP] format
    • (2.1) Python
    • (2.2)Java
    • (2.3)Golang
    • (2.4)Pascal
      • (2.4.1) Lazarus (Free Pascal)
      • (2.4.2)Delphi
    • (2.5) C++
  • (3) [TAR] format
    • (3.1) Python
    • (3.2)Java
    • (3.3)Golang
    • (3.4) Pascal
      • (3.4.1) Lazarus (Free Pascal)
      • (3.4.2)Delphi
  • (4) [RAR] format
    • (4.1)Python
    • (4.2)Java
    • (4.3)Golang
  • (5) [7Zip] format
    • (5.1) Python
    • (5.2)Java
    • (5.3)Golang
  • (6) [Unix Z] format
    • (6.1) Python
    • (6.2)Java
    • (6.3) Golang

(Zero) Preface

Normally when encountering a compressed package, file operations are performed after unpacking it.
For example, decompress A.zip into -> a.txt, then use a program to open a.txt and read it normally.
Various examples found on the Internet also unwrap files into files.

We did the same thing many years ago until we discovered these things:

  1. It seems to be the 4G era, the VoLTE era, where various manufacturers use extremely lengthy formats to store and transmit data.
    This results in a very large compression ratio, such as 100:1. That is, a 1GB compressed package will contain 100GB data after being unzipped.
    These unnecessary disk overhead can be completely avoided by directly reading the compressed package.

  2. After the server is virtualized, some disk reads and writes (especially writes) are extremely slow.
    For example, if 200MB data is decompressed into 4GB and then processed, it will take 4 hours for a single data node, but it will only take 2 minutes without decompression. strong>left and right.
    The time gap is really huge.

So as a last resort, I can only read and write compressed files directly.
External commands cannot be called, and cross-platform considerations must be taken into account. The implementation methods in various languages are different.
If you can read, you can usually write. In addition, reading text is more “advanced” than reading binary blocks, so let’s use reading text as an example.

Examples with can complete the function of “Directly reading text lines“. Other examples must be modified by yourself.

(1) [ZIP] format

It is the most common format under Windows.
A single compressed package can contain multiple files.
Although the compression rate is not as good as RAR and 7Z, and the reading speed is not as good as GZIP, the compatibility under Windows is the best.
Common software: WinZip.

(1.1)Python

I don’t know python at all, but it’s very useful.
Support for zip is built-in.
Reference: Data compression and archiving in Python.

import zipfile
...
if not zipfile.is_zipfile(SourceFile):
raise Exception(SourceFile + " is not in ZIP format!")
with zipfile.ZipFile(SourceFile) as zipMy:
for oneItem in zipMy.infolist():
with zipMy.open(oneItem) as filein:
lines = filein.readlines()
# do what ever with lines.

(1.2)Java

It seems that Java is streaming.
If a file is repeatedly compressed by zip several times, Java can also be used to read the innermost data at once (this is the theory, I have not tried it).
Use org.apache.commons.compress.

Add dependencies in pom.xml:

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.24.0</version>
</dependency>

In the .java code file:

import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
...

try(ZipFile zipFile = new ZipFile(SourceFile, "GBK")){<!-- -->
Enumeration<?> enums = zipFile.getEntries();
while(enums.hasMoreElements()){<!-- -->
ZipArchiveEntry entry = (ZipArchiveEntry) enums.nextElement();
if(!entry.isDirectory()){<!-- -->
try(BufferedReader br = new BufferedReader(new InputStreamReader(zipFile.getInputStream(entry), Charset.forName("GBK"))))
{<!-- -->
String aLine;
while((aLine=br.readLine()) != null) {<!-- -->
// do what ever with every Line.
}
}
}
}
}

(1.3)Golang

Same as above, I found that Go’s Reader can also stream.
The code itself is only a few lines, but the error processing takes up a lot of code.
There is also defer close, which is obviously not as concise as Python’s with, and it is not as concise as Java’s try with resource. Alas, this is also a major feature of Go. .

import (
...
"archive/zip"
...
)
...

zipReader, err := zip.OpenReader(SourceFile)
if err != nil {<!-- -->
panic(err)
}
defer func(zipReader *zip.ReadCloser) {<!-- -->
err := zipReader.Close()
if err != nil {<!-- -->
panic(err)
}
}(zipReader)
for _, f := range zipReader.File {<!-- -->
if !f.FileInfo().IsDir() {<!-- -->
inFile, err := f.Open()
if err != nil {<!-- -->
panic(err)
}
OneReader := bufio.NewReader(inFile)
for {<!-- -->
line, _, err := OneReader.ReadLine()
if err == io.EOF {<!-- -->
MyEof = true
break
}
if err != nil {<!-- -->
panic(err)
}
// do what ever with every Line.
}
}
}

(1.4)Pascal

(1.4.1)Lazarus (Free Pascal)

There is a close method in the Lazarus (FPC) official WIKI about paszlib:

  1. Do not set TUnZipper’s OutputPath.
  2. Create and read Streams in events.
  3. We can use our own Steam to obtain the content, and then read it line by line (although it is tortuous, it can be considered an indirect implementation).

The official code related to FPC is as follows (just change it slightly):

uses
  Zipper;

...

procedure TForm1.Button1Click(Sender: TObject);
begin
  ExtractFileFromZip(FileNameEdit1.FileName,Edit1.Text);
end;

procedure TForm1.DoCreateOutZipStream(Sender: TObject; var AStream: TStream;
  AItem: TFullZipFileEntry);
begin
  AStream:=TMemorystream.Create;
end;

procedure TForm1.DoDoneOutZipStream(Sender: TObject; var AStream: TStream;
  AItem: TFullZipFileEntry);
begin
  AStream.Position:=0;
  Memo1.lines.LoadFromStream(Astream);
  Astream.Free;
end;

procedure TForm1.ExtractFileFromZip(ZipName, FileName: string);
var
  ZipFile: TUnZipper;
  sl:TStringList;
begin
  sl:=TStringList.Create;
  sl.Add(FileName);
  ZipFile := TUnZipper.Create;
  try
    ZipFile.FileName := ZipName;
    ZipFile.OnCreateStream := @DoCreateOutZipStream;
    ZipFile.OnDoneStream:=@DoDoneOutZipStream;
    ZipFile.UnZipFiles(sl);
  finally
    ZipFile.Free;
    sl.Free;
  end;
end;

(1.4.2)Delphi

The new version of Delphi can be like this:

uses
System.Zip,
...
var
line:String;
aLH:TZipHeader
...
begin
LZip:=TZipFile.Create;
LZip.Open(SourceFile,zmRead);
LZip.Encoding:=TEncoding.GetEncoding(936);
for i:=0 to LZip.FileCount-1 do
begin
LOutput := TMemoryStream.Create();
LZip.Read(i,LOutput,aLH);
var asr:=TStreamReader.Create(LOutput);
while not asr.EndOfStream do
begin
line:String;
line:=asr.ReadLine;
// do what ever with every Line.
end;
FreeAndNil(asr);
      FreeAndNil(LOutput);
    end;
    FreeAndNil(LZip);
end;

(1.5)C++

Use zlib. The following example uses zlib1.3 (August 18, 2023)

If it is processed from the actual zip file, it seems to be possible.
PS: Also provided by zlib, gz can read rows (see the gz section below), while zip can only read data blocks.
To read strings line by line, you need to judge the position of \\
and form the string yourself. .

unzFile zfile = unzOpen64(SourceFile);
unz_global_info64 globalInfo;
if (UNZ_OK != unzGoToFirstFile(zfile))
{<!-- -->
return false;
}

char fileName[512] = {<!-- --> 0 };
unz_file_info64 fileInfo;
do
{<!-- -->
if (UNZ_OK != unzGetCurrentFileInfo64(zfile, & amp;fileInfo, fileName, sizeof(fileName), nullptr, 0, nullptr, 0))
{<!-- -->
return false;
}
if (fileInfo.external_fa == FILE_ATTRIBUTE_DIRECTORY) // Folder
{<!-- -->
//If you need to process it, create the directory yourself.
}
else // Ordinary file
{<!-- -->
if (UNZ_OK != unzOpenCurrentFile(zfile))
{<!-- -->
return false;
}
int size = 0;
while (unzReadCurrentFile(zfile, Buffer, bufferSize) != NULL)
{<!-- -->
//do what ever with Buffer
}
}
} while (unzGoToNextFile(zfile) != UNZ_END_OF_LIST_OF_FILE);
unzClose(zfile);

(2) [GZIP] format

It is the most common compression format under Linux/Unix.
A single compressed package can only contain a single file.

  • You can save the original file name information before compression into meta data.
  • You can also not save it (remove the .gz suffix to get the original file name).

(2.1)Python

Support for gz is built-in, no additional installation package is required.
There is no difference at all from opening a file.
It seems that the original file name in Meda data cannot be read.

import gzip
...
with gzip.open(SourceFile,'r') as gzMy:
    lines=gzMy.readlines()
    # do what ever with lines.

(2.2)Java

You can use GzipCompressorInputStream.getMetaData().getFilename() to read the original file name (it will be empty if not saved).

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.24.0</version>
</dependency>
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
...

try(BufferedReader br = new BufferedReader(new InputStreamReader(new GzipCompressorInputStream(Files.newInputStream(Paths.get(SourceFile)))), Charset.forName("GBK"))))
{<!-- -->
String aLine;
while((aLine=br.readLine()) != null) {<!-- -->
// do what ever with every Line.
}
}

(2.3)Golang

The original file name can be read from *gzip.Reader.Name (it will be empty if not saved).

import (
...
"compress/gzip"
...
)
...

fz, err := gzip.NewReader(SourceFile)
if err != nil {<!-- -->
panic(err)
}
defer func(fz *gzip.Reader) {<!-- -->
err := fz.Close()
if err != nil {<!-- -->
panic(err)
}
}(fz)
OneReader := bufio.NewReader(fz)
for {<!-- -->
line, _, err := OneReader.ReadLine()
if err == io.EOF {<!-- -->
MyEof = true
break
}
if err != nil {<!-- -->
panic(err)
}
// do what ever with every Line.
}

(2.4)Pascal

(2.4.1)Lazarus (Free Pascal)

Also in the official example, there seems to be no way to read a line.
But it can be used after modification, right? (The following is a picture, please refer to the official documentation for the code).
code

(2.4.2)Delphi

The new version of Delphi can be like this:

uses
System.ZLib,
...
var
line:String;
...
begin
LInput := TFileStream.Create(SourceFile, fmOpenRead);
LUnGZ := TZDecompressionStream.Create(LInput,15 + 16);
var asr:=TStreamReader.Create(LUnGZ);
while not asr.EndOfStream do
begin
line:String;
line:=asr.ReadLine;
// do what ever with every Line.
end;
FreeAndNil(asr);
    FreeAndNil(LUnGZ);
    FreeAndNil(LInput);
end;

(2.5)C++

Use zlib. The following example uses zlib1.3 (August 18, 2023)

If it is processed from the actual gz file, it seems to be possible.
PS: To be more rigorous, you need to determine whether the length of the read content is bufferSize-1 (there is no newline character), and then process it yourself.

gzFile gzf = gzopen(SourceFile, "r");
while (gzgets(gzf, Buffer, bufferSize)!=NULL)
{<!-- -->
//do what ever with Buffer as char*
}
gzclose(gzf);

But if it comes from a stream, such as the InputStream below, then the code is a bit long.

The following code does not provide the convenient method of readline.
You need to copy the decoded outBuffer to your own cache, then judge the position of \\
and form a string yourself.
If you still need to freely fseek move the position, then it is better to write everything into the memory stream and then operate.

For reference (can only be used after modification):

z_stream zlibStream;
memset( & amp;zlibStream, 0, sizeof(zlibStream));
zlibStream.next_in = nullptr;
zlibStream.avail_in = 0;
zlibStream.next_out = nullptr;
zlibStream.avail_out = 0;
zlibStream.zalloc = Z_NULL;
zlibStream.zfree = Z_NULL;
zlibStream.opaque = Z_NULL;

if (inflateInit2( & amp;zlibStream, 16 + MAX_WBITS) != Z_OK) {<!-- -->
// show error
if (error_list_.end() == error_list_.find(path))
{<!-- -->
error_list_[path] = 0x00;
}
error_list_[path] |= 0x04;
continue;
}

char* inBuffer = new char[bufferSize];
char* outBuffer = new char[bufferSize];

int zlibResult;
do {<!-- -->
InputStream.read(inBuffer, bufferSize);
zlibStream.next_in = reinterpret_cast<Bytef*>(inBuffer);
zlibStream.avail_in = (uInt)InputStream.gcount();
do {<!-- -->
zlibStream.next_out = reinterpret_cast<Bytef*>(outBuffer);
zlibStream.avail_out = bufferSize;

zlibResult = inflate( & amp;zlibStream, Z_NO_FLUSH);
if (zlibResult == Z_STREAM_ERROR) {<!-- -->
// show error
inflateEnd( & amp;zlibStream);
if (error_list_.end() == error_list_.find(path))
{<!-- -->
error_list_[path] = 0x00;
}
error_list_[path] |= 0x04;
continue;
}
//Do something with decompressed data, write to some file?
//OutputStream.write(outBuffer, bufferSize - zlibStream.avail_out);
} while (zlibStream.avail_out == 0);
} while (InputStream.good() || zlibResult == Z_OK);

delete[] inBuffer;
delete[] outBuffer;

inflateEnd( & amp;zlibStream);
OriginalFile.flush();
OriginalFile.close();

(3) [TAR] format

It is the most common packaging format under Linux/Unix.
A single compressed package can contain multiple files.
Packaging is usually used together with compression, such as: .tar.gz, .tgz, .tar.xz.

(3.1)Python

Support for tar is built-in, no additional installation package is required.
The following examples are for .tar.gz and .tgz. Other formats are similar.
Python does not need to be strung together to solve gz and tar. You only need to specify the parameters when opening.
The name of each file in the package can be obtained from TarInfo.name.

import tarfile
...
with tarfile.open(SourceFile,'r:gz') as tarMy:
    for oneItem in tarMy.getmembers():
        with tarMy.extractfile(oneItem) as filein:
            lines = filein.readlines()
    # do what ever with lines.

(3.2)Java

The same example as above is a separate .tar:
The name of each file in the package can be obtained from org.apache.tools.tar.TarEntry.getName().

<!-- https://mvnrepository.com/artifact/org.apache.ant/ant -->
<dependency>
    <groupId>org.apache.ant</groupId>
    <artifactId>ant</artifactId>
    <version>1.10.14</version>
</dependency>
import org.apache.tools.tar.TarEntry;
import org.apache.tools.tar.TarInputStream;
...

try (TarInputStream in = new TarInputStream(Files.newInputStream(new File(SourceFile).toPath()),"UTF8")) {<!-- -->
TarEntry entry;
while ((entry = in.getNextEntry()) != null) {<!-- -->
if (entry.isFile()) {<!-- -->
try(BufferedReader br = new BufferedReader(new InputStreamReader(in, Charset.forName("GBK"))))
{<!-- -->
String aLine;
while((aLine=br.readLine()) != null) {<!-- -->
// do what ever with every Line.
}
}
}
}
}

(3.3)Golang

The same example as above is a separate .tar file.
If it is tar.gz, just concatenate it with gz.
Similar to: OneTAR := tar.NewReader(GZReader).
The name of each file in the package can be obtained from *tar.Header.FileInfo().Name().

import (
...
"archive/tar"
...
)
...

OneTAR := tar.NewReader(SourceFile)
for {<!-- -->
h, err := OneTAR.Next()
if err == io.EOF {<!-- -->
break
} else if err != nil {<!-- -->
panic(err)
}
if !h.FileInfo().IsDir() {<!-- -->
OneReader := bufio.NewReader(OneTAR)
for {<!-- -->
line, _, err := OneReader.ReadLine()
if err == io.EOF {<!-- -->
MyEof = true
break
}
if err != nil {<!-- -->
panic(err)
}
// do what ever with every Line.
}
}
}

(3.4)Pascal

(3.4.1)Lazarus (Free Pascal)

Please refer to the official documentation.

(3.4.2)Delphi

The new version of Delphi can be like this:
LibTar (Tar Library) is required, and there may be other implementation methods, not sure yet.
The same example as above is a separate .tar file.
If it is tar.gz, just concatenate it with gz.

uses
LibTar,
...
var
line:String;
DirRec:TTarDirRec;
...
begin
LInput := TFileStream.Create(SourceFile, fmOpenRead);
    TarStream:= TTarArchive.Create(LInput);
    TarStream.Reset;
    while TarStream.FindNext(DirRec) do
    begin
LOutput := TMemoryStream.Create();
TarStream.ReadFile(LOutput);
var asr:=TStreamReader.Create(LOutput);
while not asr.EndOfStream do
begin
line:String;
line:=asr.ReadLine;
// do what ever with every Line.
end;
FreeAndNil(asr);
FreeAndNil(LOutput);
end;
FreeAndNil(TarStream);
FreeAndNil(LInput);
end;

(4) [RAR] format

It is the mainstream compression format under Windows.
A single compressed package can contain multiple files.
The compression is relatively high, and the latest one is RAR5 format.
Common software: WinRAR.

(4.1)Python

Requires pip install rarfile.
The usage is basically the same as zip. The actual test was not successful and you need to call unrar.exe or something.
This item is reserved for now (the method of referring to zipfile is really almost the same).

(4.2)Java

The RAR5 format cannot be processed using com.github.junrar.
Use net.sf.sevenzipjbinding to process RAR5, but using it directly can only read data blocks.

  • This method can not only read RAR, but also decompress many formats, including 7Z, ZIP, TAR, GZ, ARJ, CAB, WIM, etc…
    If you just need to decompress it into a file, then this single method can handle all common compression formats.

  • The part of obtaining the file name below has nothing to do with decompressing the RAR. The main reason is that the file name cannot be obtained in the gz format (no metadata).

  • Modify the dependency in pom to sevenzipjbinding-all-platforms to support more platforms (MAC, ARM…).

<!-- https://mvnrepository.com/artifact/net.sf.sevenzipjbinding/sevenzipjbinding -->
<dependency>
    <groupId>net.sf.sevenzipjbinding</groupId>
    <artifactId>sevenzipjbinding</artifactId>
    <version>16.02-2.01</version>
</dependency>

<!-- https://mvnrepository.com/artifact/net.sf.sevenzipjbinding/sevenzipjbinding-windows-amd64 -->
<dependency>
    <groupId>net.sf.sevenzipjbinding</groupId>
    <artifactId>sevenzipjbinding-windows-amd64</artifactId>
    <version>16.02-2.01</version>
</dependency>

<!-- https://mvnrepository.com/artifact/net.sf.sevenzipjbinding/sevenzipjbinding-linux-amd64 -->
<dependency>
    <groupId>net.sf.sevenzipjbinding</groupId>
    <artifactId>sevenzipjbinding-linux-amd64</artifactId>
    <version>16.02-2.01</version>
</dependency>

SevenZip.openInArchive(null...Why use null in the code? It is to automatically detect the format.
By the way, I’m complaining about anonymous functions. If I’m not familiar with them, I can’t understand what they’re doing.
It is actually processing ISequentialOutStream->write(byte[] data).

import actp.tnu.api.CrossLog;
import net.sf.sevenzipjbinding.ExtractOperationResult;
import net.sf.sevenzipjbinding.IInArchive;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.SevenZipException;
import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream;
import net.sf.sevenzipjbinding.simple.ISimpleInArchive;
import net.sf.sevenzipjbinding.simple.ISimpleInArchiveItem;

import java.io.File;
import java.io.FileOutputStream;
import java.io.RandomAccessFile;
...

public static void Uncompress(File inputFile, String targetFileDir) throws Exception {<!-- -->
    File newdir = new File(targetFileDir);
    if (!newdir.exists() & amp; & amp; !newdir.mkdirs()) {<!-- -->
        throw new Exception("Create Dir failed! : " + targetFileDir);
    }
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(inputFile, "r");
         IInArchive inArchive = SevenZip.openInArchive(null, new RandomAccessFileInStream(randomAccessFile))) {<!-- -->
        ISimpleInArchive simpleInArchive = inArchive.getSimpleInterface();
        for (final ISimpleInArchiveItem item : simpleInArchive.getArchiveItems()) {<!-- -->
            if (!item.isFolder()) {<!-- -->
                ExtractOperationResult result = item.extractSlow(data -> {<!-- -->
                    try {<!-- -->
                        String fileName = GetNameAndPathOK(inputFile, targetFileDir, item);
                        try (FileOutputStream fos = new FileOutputStream(targetFileDir + File.separator + fileName, true)) {<!-- -->
                            fos.write(data);
                            //change sth here, if you want to read line
                        }
                    } catch (Exception e) {<!-- -->
                        throw new SevenZipException(e.getMessage());
                    }
                    return data.length;
                });

                if (result != ExtractOperationResult.OK) {<!-- -->
                    //error
                }
            }
        }
    }
}

private static String GetNameAndPathOK(File inputFile, String targetFileDir, ISimpleInArchiveItem item) throws Exception {<!-- -->
    String fileName = item.getPath();
    if (fileName == null || fileName.isEmpty()) {<!-- -->
        fileName = inputFile.getName().substring(0, inputFile.getName().lastIndexOf("."));
    }
    if (fileName.indexOf(File.separator) > 0) {<!-- -->
        String path = targetFileDir + File.separator + fileName.substring(0, fileName.lastIndexOf(File.separator));
        File newdir1 = new File(path);
        if (!newdir1.exists() & amp; & amp; !newdir1.mkdirs()) {<!-- -->
            throw new Exception("Create Dir failed! : " + path);
        }
    }
    return fileName;
}

(4.3)Golang

Use github.com/nwaples/rardecode.
There are other ways to process RAR, such as archiver.NewRar().Unarchive(Source,Dest) which decompresses from file to file, but cannot directly read the contents of the compressed package.

import (
...
"github.com/nwaples/rardecode"
...
)
...

RARReader, err := rardecode.OpenReader(SourceFile, "")
if err != nil {<!-- -->
panic(err)
}
defer func(RARReader *rardecode.ReadCloser) {<!-- -->
err := RARReader.Close()
if err != nil {<!-- -->
panic(err)
}
}(RARReader)

for {<!-- -->
f, err := RARReader.Next()
if err != nil {<!-- -->
break
}

if !f.IsDir {<!-- -->
OneReader := bufio.NewReader(RARReader)
for {<!-- -->
line, _, err := OneReader.ReadLine()
if err == io.EOF {<!-- -->
MyEof = true
break
}
if err != nil {<!-- -->
panic(err)
}
// do what ever with every Line.
}
}
}

(5)【7Zip】Format

It is a new compression format under Windows, with the extension .7z.
A single compressed package can contain multiple files.
The compression is higher and the processing speed is faster.
Commonly used software: 7Zip. But I prefer the one based on it: NanaZip.
Nana = なな = seven.

(5.1)Python

Requires installation package: pip install py7zr.

import py7zr
...
if not py7zr.is_7zfile(SourceFile):
    raise Exception(SourceFile + " is not in 7Z format!")
with py7zr.SevenZipFile(SourceFile) as sevenMy:
    for oneItem in sevenMy.files:
        filein = sevenMy.read([oneItem.filename])
        lines = filein[oneItem.filename].readlines()
    # do what ever with lines.

(5.2)Java

Use org.apache.commons.compress.

 <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
 <dependency>
     <groupId>org.apache.commons</groupId>
     <artifactId>commons-compress</artifactId>
     <version>1.24.0</version>
 </dependency>
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
...

try (SevenZFile z7In = new SevenZFile(SourceFile)){<!-- -->
    SevenZArchiveEntry entry;
    while ((entry = z7In.getNextEntry()) != null) {<!-- -->
        if (!entry.isDirectory()) {<!-- -->
            try(BufferedReader br = new BufferedReader(new InputStreamReader(z7In.getInputStream(entry), Charset.forName("GBK"))))
            {<!-- -->
                String aLine;
                while((aLine=br.readLine()) != null) {<!-- -->
                    // do what ever with every Line.
                }
            }
        }
    }
}

(5.3)Golang

Use github.com/bodgit/sevenzip.

import (
...
"github.com/bodgit/sevenzip"
...
)
...

sevenReader, err := sevenzip.OpenReader(SourceFile)
if err != nil {<!-- -->
panic(err)
}
defer func(sevenReader *sevenzip.ReadCloser) {<!-- -->
err := sevenReader.Close()
if err != nil {<!-- -->
panic(err)
}
}(sevenReader)
for _, f := range sevenReader.File {<!-- -->
if !f.FileInfo().IsDir() {<!-- -->
inFile, err := f.Open()
if err != nil {<!-- -->
panic(err)
}
OneReader := bufio.NewReader(inFile)
for {<!-- -->
line, _, err := OneReader.ReadLine()
if err == io.EOF {<!-- -->
MyEof = true
break
}
if err != nil {<!-- -->
panic(err)
}
// do what ever with every Line.
}
}
}

(6) [Unix Z] format

It is an ancient compression format under Linux/Unix, with the extension .z.
A single compressed package can only contain a single file.

  • Unlike gzip, the Unix Z format does not seem to be able to save the original file name (removing the .z suffix is the original file name).

(6.1)Python

Requires pip install unlzw3.

import unlzw3
...
    unCompress = BytesIO(unlzw3.unlzw(Path(fileNameFull).read_bytes()))
    lines=unCompress.readlines()
    # do what ever with lines.

(6.2)Java

Use org.apache.commons.compress.

 <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
 <dependency>
     <groupId>org.apache.commons</groupId>
     <artifactId>commons-compress</artifactId>
     <version>1.24.0</version>
 </dependency>
import org.apache.commons.compress.compressors.z.ZCompressorInputStream;
...

try(BufferedReader br = new BufferedReader(new InputStreamReader(new ZCompressorInputStream(Files.newInputStream(Paths.get(SourceFile)))), Charset.forName("GBK"))))
{<!-- -->
    String aLine;
    while((aLine=br.readLine()) != null) {<!-- -->
        // do what ever with every Line.
    }
}

(6.3)Golang

Use github.com/hotei/dcompress.

import (
...
"github.com/hotei/dcompress"
...
)
...
fi, err := os.Open(filepath.Join(SourcePath, SourceFile))
if err != nil {<!-- -->
panic(err)
}
defer fi.Close()
dcompress.Verbose = true
fz, err := dcompress.NewReader(OneIn)
if err != nil {<!-- -->
panic(err)
}
OneReader := bufio.NewReader(fz)
for {<!-- -->
line, _, err := OneReader.ReadLine()
if err == io.EOF {<!-- -->
MyEof = true
break
}
if err != nil {<!-- -->
panic(err)
}
// do what ever with every Line.
}