Article directory
- (Zero) Preface
- (1) [ZIP] format
-
- (1.1) Python
- (1.2)Java
- (1.3)Golang
- (1.4)Pascal
-
- (1.4.1) Lazarus (Free Pascal)
- (1.4.2)Delphi
- (1.5)C++
- (2) [GZIP] format
-
- (2.1) Python
- (2.2)Java
- (2.3)Golang
- (2.4)Pascal
-
- (2.4.1) Lazarus (Free Pascal)
- (2.4.2)Delphi
- (2.5) C++
- (3) [TAR] format
-
- (3.1) Python
- (3.2)Java
- (3.3)Golang
- (3.4) Pascal
-
- (3.4.1) Lazarus (Free Pascal)
- (3.4.2)Delphi
- (4) [RAR] format
-
- (4.1)Python
- (4.2)Java
- (4.3)Golang
- (5) [7Zip] format
-
- (5.1) Python
- (5.2)Java
- (5.3)Golang
- (6) [Unix Z] format
-
- (6.1) Python
- (6.2)Java
- (6.3) Golang
(Zero) Preface
Normally when encountering a compressed package, file operations are performed after unpacking it.
For example, decompress A.zip
into -> a.txt
, then use a program to open a.txt
and read it normally.
Various examples found on the Internet also unwrap files into files.
We did the same thing many years ago until we discovered these things:
-
It seems to be the 4G era, the VoLTE era, where various manufacturers use extremely lengthy formats to store and transmit data.
This results in a very large compression ratio, such as100:1
. That is, a 1GB compressed package will contain 100GB data after being unzipped.
These unnecessary disk overhead can be completely avoided by directly reading the compressed package. -
After the server is virtualized, some disk reads and writes (especially writes) are extremely slow.
For example, if 200MB data is decompressed into 4GB and then processed, it will take 4 hours for a single data node, but it will only take 2 minutes without decompression. strong>left and right.
The time gap is really huge.
So as a last resort, I can only read and write compressed files directly.
External commands cannot be called, and cross-platform considerations must be taken into account. The implementation methods in various languages are different.
If you can read, you can usually write. In addition, reading text is more “advanced” than reading binary blocks, so let’s use reading text as an example.
Examples with can complete the function of “Directly reading text lines“. Other examples must be modified by yourself.
(1) [ZIP] format
It is the most common format under Windows.
A single compressed package can contain multiple files.
Although the compression rate is not as good as RAR and 7Z, and the reading speed is not as good as GZIP, the compatibility under Windows is the best.
Common software: WinZip.
(1.1)Python
I don’t know python at all, but it’s very useful.
Support for zip is built-in.
Reference: Data compression and archiving in Python.
import zipfile ... if not zipfile.is_zipfile(SourceFile): raise Exception(SourceFile + " is not in ZIP format!") with zipfile.ZipFile(SourceFile) as zipMy: for oneItem in zipMy.infolist(): with zipMy.open(oneItem) as filein: lines = filein.readlines() # do what ever with lines.
(1.2)Java
It seems that Java is streaming.
If a file is repeatedly compressed by zip several times, Java can also be used to read the innermost data at once (this is the theory, I have not tried it).
Use org.apache.commons.compress
.
Add dependencies in pom.xml
:
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.24.0</version> </dependency>
In the .java
code file:
import org.apache.commons.compress.archivers.zip.ZipArchiveEntry; import org.apache.commons.compress.archivers.zip.ZipFile; ... try(ZipFile zipFile = new ZipFile(SourceFile, "GBK")){<!-- --> Enumeration<?> enums = zipFile.getEntries(); while(enums.hasMoreElements()){<!-- --> ZipArchiveEntry entry = (ZipArchiveEntry) enums.nextElement(); if(!entry.isDirectory()){<!-- --> try(BufferedReader br = new BufferedReader(new InputStreamReader(zipFile.getInputStream(entry), Charset.forName("GBK")))) {<!-- --> String aLine; while((aLine=br.readLine()) != null) {<!-- --> // do what ever with every Line. } } } } }
(1.3)Golang
Same as above, I found that Go’s Reader can also stream.
The code itself is only a few lines, but the error
processing takes up a lot of code.
There is also defer close
, which is obviously not as concise as Python’s with
, and it is not as concise as Java’s try with resource
. Alas, this is also a major feature of Go. .
import ( ... "archive/zip" ... ) ... zipReader, err := zip.OpenReader(SourceFile) if err != nil {<!-- --> panic(err) } defer func(zipReader *zip.ReadCloser) {<!-- --> err := zipReader.Close() if err != nil {<!-- --> panic(err) } }(zipReader) for _, f := range zipReader.File {<!-- --> if !f.FileInfo().IsDir() {<!-- --> inFile, err := f.Open() if err != nil {<!-- --> panic(err) } OneReader := bufio.NewReader(inFile) for {<!-- --> line, _, err := OneReader.ReadLine() if err == io.EOF {<!-- --> MyEof = true break } if err != nil {<!-- --> panic(err) } // do what ever with every Line. } } }
(1.4)Pascal
(1.4.1)Lazarus (Free Pascal)
There is a close method in the Lazarus (FPC) official WIKI about paszlib
:
- Do not set TUnZipper’s OutputPath.
- Create and read Streams in events.
- We can use our own Steam to obtain the content, and then read it line by line (although it is tortuous, it can be considered an indirect implementation).
The official code related to FPC is as follows (just change it slightly):
uses Zipper; ... procedure TForm1.Button1Click(Sender: TObject); begin ExtractFileFromZip(FileNameEdit1.FileName,Edit1.Text); end; procedure TForm1.DoCreateOutZipStream(Sender: TObject; var AStream: TStream; AItem: TFullZipFileEntry); begin AStream:=TMemorystream.Create; end; procedure TForm1.DoDoneOutZipStream(Sender: TObject; var AStream: TStream; AItem: TFullZipFileEntry); begin AStream.Position:=0; Memo1.lines.LoadFromStream(Astream); Astream.Free; end; procedure TForm1.ExtractFileFromZip(ZipName, FileName: string); var ZipFile: TUnZipper; sl:TStringList; begin sl:=TStringList.Create; sl.Add(FileName); ZipFile := TUnZipper.Create; try ZipFile.FileName := ZipName; ZipFile.OnCreateStream := @DoCreateOutZipStream; ZipFile.OnDoneStream:=@DoDoneOutZipStream; ZipFile.UnZipFiles(sl); finally ZipFile.Free; sl.Free; end; end;
(1.4.2)Delphi
The new version of Delphi can be like this:
uses System.Zip, ... var line:String; aLH:TZipHeader ... begin LZip:=TZipFile.Create; LZip.Open(SourceFile,zmRead); LZip.Encoding:=TEncoding.GetEncoding(936); for i:=0 to LZip.FileCount-1 do begin LOutput := TMemoryStream.Create(); LZip.Read(i,LOutput,aLH); var asr:=TStreamReader.Create(LOutput); while not asr.EndOfStream do begin line:String; line:=asr.ReadLine; // do what ever with every Line. end; FreeAndNil(asr); FreeAndNil(LOutput); end; FreeAndNil(LZip); end;
(1.5)C++
Use zlib
. The following example uses zlib1.3 (August 18, 2023)
If it is processed from the actual zip file, it seems to be possible.
PS: Also provided by zlib, gz can read rows (see the gz section below), while zip can only read data blocks.
To read strings line by line, you need to judge the position of \\
and form the string yourself. .
unzFile zfile = unzOpen64(SourceFile); unz_global_info64 globalInfo; if (UNZ_OK != unzGoToFirstFile(zfile)) {<!-- --> return false; } char fileName[512] = {<!-- --> 0 }; unz_file_info64 fileInfo; do {<!-- --> if (UNZ_OK != unzGetCurrentFileInfo64(zfile, & amp;fileInfo, fileName, sizeof(fileName), nullptr, 0, nullptr, 0)) {<!-- --> return false; } if (fileInfo.external_fa == FILE_ATTRIBUTE_DIRECTORY) // Folder {<!-- --> //If you need to process it, create the directory yourself. } else // Ordinary file {<!-- --> if (UNZ_OK != unzOpenCurrentFile(zfile)) {<!-- --> return false; } int size = 0; while (unzReadCurrentFile(zfile, Buffer, bufferSize) != NULL) {<!-- --> //do what ever with Buffer } } } while (unzGoToNextFile(zfile) != UNZ_END_OF_LIST_OF_FILE); unzClose(zfile);
(2) [GZIP] format
It is the most common compression format under Linux/Unix.
A single compressed package can only contain a single file.
- You can save the original file name information before compression into meta data.
- You can also not save it (remove the .gz suffix to get the original file name).
(2.1)Python
Support for gz is built-in, no additional installation package is required.
There is no difference at all from opening a file.
It seems that the original file name in Meda data cannot be read.
import gzip ... with gzip.open(SourceFile,'r') as gzMy: lines=gzMy.readlines() # do what ever with lines.
(2.2)Java
You can use GzipCompressorInputStream.getMetaData().getFilename()
to read the original file name (it will be empty if not saved).
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.24.0</version> </dependency>
import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; ... try(BufferedReader br = new BufferedReader(new InputStreamReader(new GzipCompressorInputStream(Files.newInputStream(Paths.get(SourceFile)))), Charset.forName("GBK")))) {<!-- --> String aLine; while((aLine=br.readLine()) != null) {<!-- --> // do what ever with every Line. } }
(2.3)Golang
The original file name can be read from *gzip.Reader.Name
(it will be empty if not saved).
import ( ... "compress/gzip" ... ) ... fz, err := gzip.NewReader(SourceFile) if err != nil {<!-- --> panic(err) } defer func(fz *gzip.Reader) {<!-- --> err := fz.Close() if err != nil {<!-- --> panic(err) } }(fz) OneReader := bufio.NewReader(fz) for {<!-- --> line, _, err := OneReader.ReadLine() if err == io.EOF {<!-- --> MyEof = true break } if err != nil {<!-- --> panic(err) } // do what ever with every Line. }
(2.4)Pascal
(2.4.1)Lazarus (Free Pascal)
Also in the official example, there seems to be no way to read a line.
But it can be used after modification, right? (The following is a picture, please refer to the official documentation for the code).
(2.4.2)Delphi
The new version of Delphi can be like this:
uses System.ZLib, ... var line:String; ... begin LInput := TFileStream.Create(SourceFile, fmOpenRead); LUnGZ := TZDecompressionStream.Create(LInput,15 + 16); var asr:=TStreamReader.Create(LUnGZ); while not asr.EndOfStream do begin line:String; line:=asr.ReadLine; // do what ever with every Line. end; FreeAndNil(asr); FreeAndNil(LUnGZ); FreeAndNil(LInput); end;
(2.5)C++
Use zlib
. The following example uses zlib1.3 (August 18, 2023)
If it is processed from the actual gz file, it seems to be possible.
PS: To be more rigorous, you need to determine whether the length of the read content is bufferSize-1 (there is no newline character), and then process it yourself.
gzFile gzf = gzopen(SourceFile, "r"); while (gzgets(gzf, Buffer, bufferSize)!=NULL) {<!-- --> //do what ever with Buffer as char* } gzclose(gzf);
But if it comes from a stream, such as the InputStream
below, then the code is a bit long.
The following code does not provide the convenient method of readline
.
You need to copy the decoded outBuffer
to your own cache, then judge the position of \\
and form a string yourself.
If you still need to freely fseek
move the position, then it is better to write everything into the memory stream and then operate.
For reference (can only be used after modification):
z_stream zlibStream; memset( & amp;zlibStream, 0, sizeof(zlibStream)); zlibStream.next_in = nullptr; zlibStream.avail_in = 0; zlibStream.next_out = nullptr; zlibStream.avail_out = 0; zlibStream.zalloc = Z_NULL; zlibStream.zfree = Z_NULL; zlibStream.opaque = Z_NULL; if (inflateInit2( & amp;zlibStream, 16 + MAX_WBITS) != Z_OK) {<!-- --> // show error if (error_list_.end() == error_list_.find(path)) {<!-- --> error_list_[path] = 0x00; } error_list_[path] |= 0x04; continue; } char* inBuffer = new char[bufferSize]; char* outBuffer = new char[bufferSize]; int zlibResult; do {<!-- --> InputStream.read(inBuffer, bufferSize); zlibStream.next_in = reinterpret_cast<Bytef*>(inBuffer); zlibStream.avail_in = (uInt)InputStream.gcount(); do {<!-- --> zlibStream.next_out = reinterpret_cast<Bytef*>(outBuffer); zlibStream.avail_out = bufferSize; zlibResult = inflate( & amp;zlibStream, Z_NO_FLUSH); if (zlibResult == Z_STREAM_ERROR) {<!-- --> // show error inflateEnd( & amp;zlibStream); if (error_list_.end() == error_list_.find(path)) {<!-- --> error_list_[path] = 0x00; } error_list_[path] |= 0x04; continue; } //Do something with decompressed data, write to some file? //OutputStream.write(outBuffer, bufferSize - zlibStream.avail_out); } while (zlibStream.avail_out == 0); } while (InputStream.good() || zlibResult == Z_OK); delete[] inBuffer; delete[] outBuffer; inflateEnd( & amp;zlibStream); OriginalFile.flush(); OriginalFile.close();
(3) [TAR] format
It is the most common packaging format under Linux/Unix.
A single compressed package can contain multiple files.
Packaging is usually used together with compression, such as: .tar.gz
, .tgz
, .tar.xz
.
(3.1)Python
Support for tar is built-in, no additional installation package is required.
The following examples are for .tar.gz
and .tgz
. Other formats are similar.
Python does not need to be strung together to solve gz and tar. You only need to specify the parameters when opening.
The name of each file in the package can be obtained from TarInfo.name
.
import tarfile ... with tarfile.open(SourceFile,'r:gz') as tarMy: for oneItem in tarMy.getmembers(): with tarMy.extractfile(oneItem) as filein: lines = filein.readlines() # do what ever with lines.
(3.2)Java
The same example as above is a separate .tar
:
The name of each file in the package can be obtained from org.apache.tools.tar.TarEntry.getName()
.
<!-- https://mvnrepository.com/artifact/org.apache.ant/ant --> <dependency> <groupId>org.apache.ant</groupId> <artifactId>ant</artifactId> <version>1.10.14</version> </dependency>
import org.apache.tools.tar.TarEntry; import org.apache.tools.tar.TarInputStream; ... try (TarInputStream in = new TarInputStream(Files.newInputStream(new File(SourceFile).toPath()),"UTF8")) {<!-- --> TarEntry entry; while ((entry = in.getNextEntry()) != null) {<!-- --> if (entry.isFile()) {<!-- --> try(BufferedReader br = new BufferedReader(new InputStreamReader(in, Charset.forName("GBK")))) {<!-- --> String aLine; while((aLine=br.readLine()) != null) {<!-- --> // do what ever with every Line. } } } } }
(3.3)Golang
The same example as above is a separate .tar
file.
If it is tar.gz
, just concatenate it with gz.
Similar to: OneTAR := tar.NewReader(GZReader)
.
The name of each file in the package can be obtained from *tar.Header.FileInfo().Name()
.
import ( ... "archive/tar" ... ) ... OneTAR := tar.NewReader(SourceFile) for {<!-- --> h, err := OneTAR.Next() if err == io.EOF {<!-- --> break } else if err != nil {<!-- --> panic(err) } if !h.FileInfo().IsDir() {<!-- --> OneReader := bufio.NewReader(OneTAR) for {<!-- --> line, _, err := OneReader.ReadLine() if err == io.EOF {<!-- --> MyEof = true break } if err != nil {<!-- --> panic(err) } // do what ever with every Line. } } }
(3.4)Pascal
(3.4.1)Lazarus (Free Pascal)
Please refer to the official documentation.
(3.4.2)Delphi
The new version of Delphi can be like this:
LibTar (Tar Library) is required, and there may be other implementation methods, not sure yet.
The same example as above is a separate .tar
file.
If it is tar.gz
, just concatenate it with gz.
uses LibTar, ... var line:String; DirRec:TTarDirRec; ... begin LInput := TFileStream.Create(SourceFile, fmOpenRead); TarStream:= TTarArchive.Create(LInput); TarStream.Reset; while TarStream.FindNext(DirRec) do begin LOutput := TMemoryStream.Create(); TarStream.ReadFile(LOutput); var asr:=TStreamReader.Create(LOutput); while not asr.EndOfStream do begin line:String; line:=asr.ReadLine; // do what ever with every Line. end; FreeAndNil(asr); FreeAndNil(LOutput); end; FreeAndNil(TarStream); FreeAndNil(LInput); end;
(4) [RAR] format
It is the mainstream compression format under Windows.
A single compressed package can contain multiple files.
The compression is relatively high, and the latest one is RAR5 format.
Common software: WinRAR.
(4.1)Python
Requires pip install rarfile
.
The usage is basically the same as zip. The actual test was not successful and you need to call unrar.exe or something.
This item is reserved for now (the method of referring to zipfile is really almost the same).
(4.2)Java
The RAR5 format cannot be processed using com.github.junrar
.
Use net.sf.sevenzipjbinding
to process RAR5, but using it directly can only read data blocks.
-
This method can not only read RAR, but also decompress many formats, including 7Z, ZIP, TAR, GZ, ARJ, CAB, WIM, etc…
If you just need to decompress it into a file, then this single method can handle all common compression formats. -
The part of obtaining the file name below has nothing to do with decompressing the RAR. The main reason is that the file name cannot be obtained in the gz format (no metadata).
-
Modify the dependency in pom to
sevenzipjbinding-all-platforms
to support more platforms (MAC, ARM…).
<!-- https://mvnrepository.com/artifact/net.sf.sevenzipjbinding/sevenzipjbinding --> <dependency> <groupId>net.sf.sevenzipjbinding</groupId> <artifactId>sevenzipjbinding</artifactId> <version>16.02-2.01</version> </dependency> <!-- https://mvnrepository.com/artifact/net.sf.sevenzipjbinding/sevenzipjbinding-windows-amd64 --> <dependency> <groupId>net.sf.sevenzipjbinding</groupId> <artifactId>sevenzipjbinding-windows-amd64</artifactId> <version>16.02-2.01</version> </dependency> <!-- https://mvnrepository.com/artifact/net.sf.sevenzipjbinding/sevenzipjbinding-linux-amd64 --> <dependency> <groupId>net.sf.sevenzipjbinding</groupId> <artifactId>sevenzipjbinding-linux-amd64</artifactId> <version>16.02-2.01</version> </dependency>
SevenZip.openInArchive(null...
Why use null in the code? It is to automatically detect the format.
By the way, I’m complaining about anonymous functions. If I’m not familiar with them, I can’t understand what they’re doing.
It is actually processing ISequentialOutStream->write(byte[] data)
.
import actp.tnu.api.CrossLog; import net.sf.sevenzipjbinding.ExtractOperationResult; import net.sf.sevenzipjbinding.IInArchive; import net.sf.sevenzipjbinding.SevenZip; import net.sf.sevenzipjbinding.SevenZipException; import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream; import net.sf.sevenzipjbinding.simple.ISimpleInArchive; import net.sf.sevenzipjbinding.simple.ISimpleInArchiveItem; import java.io.File; import java.io.FileOutputStream; import java.io.RandomAccessFile; ... public static void Uncompress(File inputFile, String targetFileDir) throws Exception {<!-- --> File newdir = new File(targetFileDir); if (!newdir.exists() & amp; & amp; !newdir.mkdirs()) {<!-- --> throw new Exception("Create Dir failed! : " + targetFileDir); } try (RandomAccessFile randomAccessFile = new RandomAccessFile(inputFile, "r"); IInArchive inArchive = SevenZip.openInArchive(null, new RandomAccessFileInStream(randomAccessFile))) {<!-- --> ISimpleInArchive simpleInArchive = inArchive.getSimpleInterface(); for (final ISimpleInArchiveItem item : simpleInArchive.getArchiveItems()) {<!-- --> if (!item.isFolder()) {<!-- --> ExtractOperationResult result = item.extractSlow(data -> {<!-- --> try {<!-- --> String fileName = GetNameAndPathOK(inputFile, targetFileDir, item); try (FileOutputStream fos = new FileOutputStream(targetFileDir + File.separator + fileName, true)) {<!-- --> fos.write(data); //change sth here, if you want to read line } } catch (Exception e) {<!-- --> throw new SevenZipException(e.getMessage()); } return data.length; }); if (result != ExtractOperationResult.OK) {<!-- --> //error } } } } } private static String GetNameAndPathOK(File inputFile, String targetFileDir, ISimpleInArchiveItem item) throws Exception {<!-- --> String fileName = item.getPath(); if (fileName == null || fileName.isEmpty()) {<!-- --> fileName = inputFile.getName().substring(0, inputFile.getName().lastIndexOf(".")); } if (fileName.indexOf(File.separator) > 0) {<!-- --> String path = targetFileDir + File.separator + fileName.substring(0, fileName.lastIndexOf(File.separator)); File newdir1 = new File(path); if (!newdir1.exists() & amp; & amp; !newdir1.mkdirs()) {<!-- --> throw new Exception("Create Dir failed! : " + path); } } return fileName; }
(4.3)Golang
Use github.com/nwaples/rardecode
.
There are other ways to process RAR, such as archiver.NewRar().Unarchive(Source,Dest)
which decompresses from file to file, but cannot directly read the contents of the compressed package.
import ( ... "github.com/nwaples/rardecode" ... ) ... RARReader, err := rardecode.OpenReader(SourceFile, "") if err != nil {<!-- --> panic(err) } defer func(RARReader *rardecode.ReadCloser) {<!-- --> err := RARReader.Close() if err != nil {<!-- --> panic(err) } }(RARReader) for {<!-- --> f, err := RARReader.Next() if err != nil {<!-- --> break } if !f.IsDir {<!-- --> OneReader := bufio.NewReader(RARReader) for {<!-- --> line, _, err := OneReader.ReadLine() if err == io.EOF {<!-- --> MyEof = true break } if err != nil {<!-- --> panic(err) } // do what ever with every Line. } } }
(5)【7Zip】Format
It is a new compression format under Windows, with the extension .7z
.
A single compressed package can contain multiple files.
The compression is higher and the processing speed is faster.
Commonly used software: 7Zip. But I prefer the one based on it: NanaZip.
Nana = なな = seven.
(5.1)Python
Requires installation package: pip install py7zr
.
import py7zr ... if not py7zr.is_7zfile(SourceFile): raise Exception(SourceFile + " is not in 7Z format!") with py7zr.SevenZipFile(SourceFile) as sevenMy: for oneItem in sevenMy.files: filein = sevenMy.read([oneItem.filename]) lines = filein[oneItem.filename].readlines() # do what ever with lines.
(5.2)Java
Use org.apache.commons.compress
.
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.24.0</version> </dependency>
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry; import org.apache.commons.compress.archivers.sevenz.SevenZFile; ... try (SevenZFile z7In = new SevenZFile(SourceFile)){<!-- --> SevenZArchiveEntry entry; while ((entry = z7In.getNextEntry()) != null) {<!-- --> if (!entry.isDirectory()) {<!-- --> try(BufferedReader br = new BufferedReader(new InputStreamReader(z7In.getInputStream(entry), Charset.forName("GBK")))) {<!-- --> String aLine; while((aLine=br.readLine()) != null) {<!-- --> // do what ever with every Line. } } } } }
(5.3)Golang
Use github.com/bodgit/sevenzip
.
import ( ... "github.com/bodgit/sevenzip" ... ) ... sevenReader, err := sevenzip.OpenReader(SourceFile) if err != nil {<!-- --> panic(err) } defer func(sevenReader *sevenzip.ReadCloser) {<!-- --> err := sevenReader.Close() if err != nil {<!-- --> panic(err) } }(sevenReader) for _, f := range sevenReader.File {<!-- --> if !f.FileInfo().IsDir() {<!-- --> inFile, err := f.Open() if err != nil {<!-- --> panic(err) } OneReader := bufio.NewReader(inFile) for {<!-- --> line, _, err := OneReader.ReadLine() if err == io.EOF {<!-- --> MyEof = true break } if err != nil {<!-- --> panic(err) } // do what ever with every Line. } } }
(6) [Unix Z] format
It is an ancient compression format under Linux/Unix, with the extension .z
.
A single compressed package can only contain a single file.
- Unlike gzip, the Unix Z format does not seem to be able to save the original file name (removing the .z suffix is the original file name).
(6.1)Python
Requires pip install unlzw3
.
import unlzw3 ... unCompress = BytesIO(unlzw3.unlzw(Path(fileNameFull).read_bytes())) lines=unCompress.readlines() # do what ever with lines.
(6.2)Java
Use org.apache.commons.compress
.
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> <version>1.24.0</version> </dependency>
import org.apache.commons.compress.compressors.z.ZCompressorInputStream; ... try(BufferedReader br = new BufferedReader(new InputStreamReader(new ZCompressorInputStream(Files.newInputStream(Paths.get(SourceFile)))), Charset.forName("GBK")))) {<!-- --> String aLine; while((aLine=br.readLine()) != null) {<!-- --> // do what ever with every Line. } }
(6.3)Golang
Use github.com/hotei/dcompress
.
import ( ... "github.com/hotei/dcompress" ... ) ... fi, err := os.Open(filepath.Join(SourcePath, SourceFile)) if err != nil {<!-- --> panic(err) } defer fi.Close() dcompress.Verbose = true fz, err := dcompress.NewReader(OneIn) if err != nil {<!-- --> panic(err) } OneReader := bufio.NewReader(fz) for {<!-- --> line, _, err := OneReader.ReadLine() if err == io.EOF {<!-- --> MyEof = true break } if err != nil {<!-- --> panic(err) } // do what ever with every Line. }