Message Compression in Kafka

As we have seen that the producer sends data to the Kafka in the text format, commonly called the JSON format. JSON has a demerit, i.e., data is stored in the string form. This creates several duplicated records to get stored in the Kafka topic. Thus, it occupies much disk space. Consequently, it is required to reduce disk space. This can be done by compressing or lingering the data before sending it to the Kafka.

Need for Message Compression

There can be the following reasons which better describes the need to reduce the message size:

  1. It will reduce the latency and size required to send data to Kafka.
  2. It will reduce the bandwidth that will make users increase the net messages which are sent to the broker.
  3. It can lead to low cost when the data is stored in the Kafka via cloud platforms. It is because cloud services are paid. Therefore, it calculates the amount of data stored in Kafka.
  4. Message compression does not need any change in the configuration of the broker and consumer.
  5. Message compression does not need any change in the configuration of the broker and consumer.
  6. The reduced disk load will lead to fast read and write operations.

Producer Batch/Record Batch

A producer writes messages to the Kafka, one by one. Therefore, Kafka plays smartly. It waits for the messages that are being produced to Kafka. Then, it creates a batch and put the messages into it, until it becomes full. Then, send the batch to the Kafka. Such type of batch is known as a Producer Batch. The default batch size is 16KB, and the maximum can be anything. Large is the batch size, more is the compression, throughput, and efficiency of producer requests.

Kafka Message Compression

Note: The message size should not exceed the batch size. Otherwise, the message will not be batched. Also, the batch is allocated per partitions, so do not set it to a very high number.

Bigger is the producer batch, effective to use the message compression technique.

Message Compression Format

Message Compression is always done at the producer side, so there is no requirement to change the configurations at the consumer or broker side.

Kafka Message Compression

In the figure, a producer batch of 200 MB is created. After compression, it is reduced to 101 MB.

To compress the data, a 'compression.type' is used. This lets users decide the type of compression. The type can be 'gzip', 'snappy', 'lz4', or 'none'(default). The 'gzip' has the maximum compression ratio.

Disadvantages of Message Compression

There are following disadvantages of the message compression:

  1. The producers commit some CPU cycles for compression.
  2. The consumers commit some CPU cycles for decompression.
  3. These disadvantages lead to increased CPU usage.

Thus, message compression is a better option to reduce the disk load.


Next TopicKafka Security