Python Support for GZip Files (gzip)

Introduction:

In this tutorial, we are learning Python Support for gzip files (gzip). GZip application is used to compress and decompress files. It is part of the GNU project. Python's gzip module is the interface to the GZip implementation. The gzip file compression algorithm itself is based on the zlib module in Python. The Gzip module contains the contents and methods of the GzipFile class. It also has simple functions such as open(), compress(), and decompress(). The easiest way to achieve compression and decompression is to use the functions described below.

1. open() function:

The open() function compresses a gzipped file in binary or text format and returns the file as an object, which can be a physical file, string, or byte object. By default, the file opens in the "rb" format, which reads binary files, but this process's operating mode can use other formats below.

This function also defines the compression level, which can be between 0 and 9. When the file is opened in the text mode, then the GzipFile object is wrapped by the TextIOWrapper object in Python.

2. compress() function:

The compress() function compresses the data given as arguments to it and then returns the compressed bytes object. The default compression level is 9.

Syntax:

The syntax of the compress() function is given in below -

Parameters:

The parameters of the compress() function are given in below -

  • data: Users must specify the data to compress.
  • compresslevel: The default value is 9. The compresslevel argument is defined by 0 to 9. Here, 1 produces the fastest compression, and 9 provides the slowest compression. If 0 is given, no compression is performed.
  • mtime: The mtime parameter is defined as the optional timestamp number recorded for the last modified time in the stream during compression. It should only be provided during compression mode. Defaults to None or is used at the current time.

3. decompress() function:

The decompress() function decompresses the byte object and then returns the raw data. The Python gzip.decompress() function can decompress multi-member gzip files, which are files in which multiple gzip blocks are concatenated together.

Syntax:

The syntax of the decompress() function is given in below -

Parameters:

The parameters of the decompress() function are given below -

  • data: Users must specify the data to decompress.

What is the need for the gzip module in Python?

Data compression is needed. Since many files are created every minute, the process of re-encoding and rearranging data to reduce its size by using less of the original data is called data compression. The algorithm helps to find the best and most effective way to reduce the data size, such as using the dictionary to convert the original string into smaller strings.

Data compression reduces text files to 50% of their original size. Larger files are sent over the Internet in compressed formats such as ZIP, RAR, 7z, or MP3. Data compression also reduces the time it takes to transfer files by reducing their physical size and using less storage and memory.

Data compression has many benefits, such as reducing storage, data transfer time, and communication bandwidth, ultimately saving a large amount of money. The only drawback of data compression so far is that it requires a lot of resources to expand a lot of data, and compression vendors put a lot of emphasis on optimizing speed and resource usage to reduce the impact of heavy compression work.

Program Code:

Here we give a program code of the compress() function in the gzip() module in Python. The code is given below -

Output:

Now we run the above code and find the compressed string from it. We obtain a raw string as shown above and calculate its length. Then, we use the Python gzip.compress() function to expand the file size. When we calculate the length of the compressed string, we see that it will be longer than the original string because the data is encrypted during the compression process, which can be seen by the zlib.compress() function. The output is given below -

b'Hello! It is original text. Now it will be compressed.'
This is a value that represents the length of the original text 54
 Here the backend compressed string are looks somethings like this: b'x\x9c\xf3H\xcd\xc9\xc9WT\xf0,Q\xc8,V\xc8/\xcaL\xcf\xccK\xccQ(I\xad(\xd1S\xf0\xcb/W\xc8,Q(\xcf\xcc\xc9QHJUH\xce\xcf-(J-.NM\xd1\x03\x00\x01`\x13\n'
The compressed text length is represent by this value:  77
 The initial length of the string is 54 and the length after the Python gzip.compress() function is 77 because the string is encrypted in bytes.

Program Code:

Here we give a program code of the decompress() function in the gzip() module in Python. The code is given below -

Output:

Now we run the above code and find the decompressed string from it. We obtain a raw string as shown above and calculate its length. Then, we use Python gzip.compress() function to compress the file size. We then decompress the compressed string by using the Python gzip.decompress() function. We calculate the string length and the result is the same as the original length. During the compression process, data is encrypted by increasing its length, as shown above. The output is given below -

b'Hello! It is original text. Now it will be decompressed.'
This is a value that represents the length of the original text 56
The compressed text length is represented by this value:  79
The decompressed text length is represented by this value 56

Example:

Here, we give an example of creating a gzip file by writing the compressed data in it.

This will create the file "text.txt.gz" in the current directory. This gzip file contains the "text.txt" file, which you can check by using the decompression tool. Read this compressed file programmatically.

Compressed an existing file into a gzip file, read the text, and converted it to a byte array. In a gzip file, this bytearray object is written. The example below assumes the file "z.txt" exists in the current directory.

Get uncompressed files from the gzip archive in the below code -

The above code will create the file "z1.txt" containing the same files as "z.txt" in the current directory. In addition to these basic features, the gzip module includes the GzipFile class, which defines the compress() and decompress() methods. The constructor of this class will use data, type, and compression-level parameters, which have the same meaning as above. When the Type parameter is specified as "w" or "wb" or "wt". The GipFile object will provide a way to save the file and write it to the gzip file.

This will create the newtext.txt.gz file. You can unzip the file using utilities to see that it contains the file newtext.txt, which contains the text "Python" and "has batteries". To decompress a gzip file by using the GzipFile object, create it with the "rb" value of the mode parameter and read the uncompressed file via the read() method.

Conclusion:

So, in this tutorial, we are learning about Python Support for gzip files (gzip). The process of re-encoding and rearranging data to be smaller than the original data is called data compression. This algorithm helps to find the best and most effective way to reduce data size. The Python gzip.decompress() function decompresses a file and returns the bytes of the compressed file. Python's gzip.decompress() function can decompress multiple member gzip files that is, multiple gzip parts concatenated together. The Python gzip.compress() function is used to compress a file to reduce its size. The return value is a byte object. The default compression level is 9.