Java Compression Performance

7 min readJan 10, 2023

A cover image: old manometer gauge. — Cover image by indoposter from Pixabay

Data compression is a very broad topic, we may find lots of materials on this topic on Internet. There are amazing benchmarks for all kinds of compression algorithms. Benchmarks (1 2 3) for Java exist, but it seems to be a little bit outdated (not using JMH, for example) as it was written a long time ago.

Compression effectiveness (both performance and size) is dependent on the actual data. Therefore, it made sense to me to redo benchmarks to determine which algorithm (if any) will fit the most, and just learn a bit more about data compression in Java.

What data do we benchmark

We are going to benchmark on two sets of data: our internal data structure of different sizes and a set of artificial randomly generated data with different compression ratios and sizes.

An internal data structure is a kind of trie with URLs, UUIDs and some other binary information (encoded as protobuf messages). We can’t share the data itself as may contain some user data.

Random dataset is just random characters of different character sets. Here is the function to generate data:

private static java.util.Random rnd = ...;

private static byte[] generateImpl(
  byte[] alphabet,
  int length)
{
  byte[] result = new byte[length];
  for (int i = 0; i < result.length; i++) {
      result[i] = alphabet[rnd.nextInt(alphabet.length)];
  }
  return result;
}

For different alphabets I calculated compression ratios for deflate compression algorithm and called different sets according to its compression ratio:

low (1.3) -> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
medium (2.1) -> "0123456789"
high (3.4) -> "0123"
extra high (6.2) -> "01"

Disclaimer: random dataset isn’t entirely random as you may see, for the purposes of the benchmark it’s just a non-static dataset with more or less stable compression ratio.

Compression Algorithms

I picked a few popular compression algorithms:

gzip via Apache Commons Compress
deflate also via commons-compress
MySQL compression: basically, it’s uses deflate algorithm and also prefixes with 4 bytes of the original data length (which helps allocating buffer for decompression). Via java.util.zip.Deflater.
zstd: Facebook’s compression algorithm that’s supposedly faster and has better compression ratio than deflate.
snappy: Google’s compression algorithm that was designed to be very fast with reasonable compression ratios. Via snappy-java.
brotli: Another Google’s compression algorithm that aims for the better compression ratio and comparable performance with deflate. For brotli we will use 3 compression levels: 0, 6 and 11 (default, maximum). Using via Brotli4j.
lz4: a compression algorithm that aims for the best decoding performance. For lz4 we will use 3 compression levels: fast, 9 (default), 17 (maximum). Via lz4-java.

Compression Ratios

For the Real dataset the best compression ratio had brotli, the worst snappy and lz4 fast:

For the Stub dataset (random) the best is brotli and the worst is consistently lz4 fast.

A spreadsheet with all compression ratios is available here.

Benchmarks

Benchmarks were performed using JMH for 3 JDKs: openjdk-8, openjdk-11 and openjdk-17. If not mentioned separately, the charts are for openjdk-17.

Decoding on Real Data

First, let’s see the performance of decoding for the Real Data (data length is between 600KB and 4MB):

A line chart for performance of decoding for real data — Decoding Performance in microseconds for Real Data (600KB-4MB)

Another view for this chart would be throughput in bytes per second:

A line chart: Decoding Throughput for Real Data (600KB-4MB) — Decoding Throughput in bytes for Real Data (600KB-4MB)

From these charts it’s clear that lz4 (all 3 levels) outperforms significantly all other algorithms. The next after lz4 fast are zstd and snappy.

Encoding of Real Data

Next, let’s check encoding performance for the Real Data (data length is between 600KB and 4MB):

A line chart: Encoding Performance in microseconds for Real Data (600KB-4MB) — Encoding Performance in microseconds for Real Data (600KB-4MB)

Wow! Brotli with level 11 took 13 seconds to encode 4MB of data… At first I didn’t believe it, thought that I must be use the library incorrectly. But then I ran original native binary:

$ time brotli ~/1/bins/total_4072805_with_34010_items.bin

real 0m12.945s
user 0m12.878s
sys 0m0.064s

Indeed, compression level 11 is extremely slow. Let’s exclude it to see normally performance of other algorithms:

A line chart: Encoding Performance in microseconds without brotli_11 for Real Data (600KB-4MB) — Encoding Performance in microseconds without brotli_11 for Real Data (600KB-4MB)

And the corresponding chart for the throughput (bytes per second):

A line chart: Encoding Throughput without brotli_11 for Real Data (600KB-4MB) — Encoding Throughput without brotli_11 for Real Data (600KB-4MB)

4 leaders here are: lz4 fast (500+MBps), brotli_0 (450+MBps), snappy (400+MBps) and zstd (250+MBps). The worst is lz4_high17 with 13–30MBps.

Comparing Performance for different JDKs

And comparing performance in different JDKs we may see that performance is more or less the same as most libraries use JNI under the hood:

A bar chart: Decoding Performance in different JDKs for 4MB input — Decoding Performance in different JDKs for 4MB input

And the same for encoding:

Stub (random) Data

Even though compression ratios in the Real dataset are different, it’s mostly 2x and higher, plus the data is not all random. It’s also interesting to check how algorithms behave on more different compression ratios and more random data. So, for Stub Data let’s check performance for different compression ratios.

lz4 fast vs snappy

Looking at throughput graphs it seems that lz4 fast compressor is a very decent alternative for snappy, when compression ratio is not very important. Decoding is faster:

As well as encoding:

A line chart: Encoding Throughput: lz4 fast vs snappy — Encoding Throughput: lz4 fast vs snappy

However, this benchmark doesn’t take into account the IO limit of the system: network/disk throughput is not infinite, so the compressor is not going to work on its best performance. I tried to emulate IO limit, but it doesn’t seem to depict the reality. And for that I’m going to run another benchmark soon…

gzip vs deflate

Gzip and deflate have 12 bytes difference in header/footer sizes. Apart from that gzip uses deflate to compress data. But, there is a performance difference between them!

This is very strange, because under the hood, both deflate and gzip use Inflater/Deflater classes from JDK.

deflate vs MySQL Compress

As I mentioned before, MySQL Compress is deflate plus size of the uncompressed input. This size (extra 4 bytes) allows to allocate buffer precisely on decoding, which may save few allocations and memory copy. And it’s something that we actually see on a graph:

Decoding Throughput: deflate vs deflate+size

This optimization indeed helps a lot.

brotli vs gzip

Brotli is supposed to be more efficient replacement for deflate/gzip. And it’s :) From the performance standpoint, brotli with compression level 6 is generally faster than gzip/deflate. After certain size (in this benchmark somewhere between 20K and 30K) brotli performs significantly better:

A line chart: Decoding Throughput: brotli (6) vs gzip — Decoding Throughput: brotli (6) vs gzip

The same goes for the encoding, but threshold is even smaller (around 3K-4K):

Other interesting details

Brotli

There are 3 libraries for Java:

org.brotli:dec: an old library by Google, last updated in 2017.
com.nixxcode.jvmbrotli:jvmbrotli: also old, updated in 2019.
com.aayushatharva.brotli4j:brotli4j: the one I used in the benchmark.

dec library allows only decompression (it’s used for decompression only in commons-compress). I decided to benchmark it, and results are very sad:

Action   (algorithm) (ratio) (length)      Score  Units
decode      brotli_6  medium   102400    813.029  us/op
decode           dec  medium   102400   2220.280  us/op
decode     jvmbrotli  medium   102400   2644.517  us/op

lz4

lz4 is supported in commons-compress. A short benchmark showed that its more than 1 order of magnitude slower than org.lz4:lz4-java that is used in the benchmark.

Action  (algorithm)  (length)    Score  Units
decode    lz4_block     62830  726.496  us/op
decode   lz4_framed     62830  891.120  us/op
decode     lz4_fast     62830   30.119  us/op
decode   lz4_high17     62830   22.561  us/op
decode    lz4_high9     62830   22.075  us/op

Conclusion

Even though the results of the benchmark look promising, I am not fully convinced that I will see the same level of performance in a real-life application: I briefly mentioned that IO limit of network/disk is hard to emulate, so it’s hard to predict how it will behave.

Anyway,

lz4 looks really promising and I’m looking forward to make a more comprehensive benchmark with it!
Brotli surprised that by default it uses maximum compression which is extremely slow. And what surprised me even more, that decoding speed of highly compressed data is also slower. Which makes some compression level in the middle a much better choice.
Zstd stands somewhere in the middle in terms of performance and compression ratio.

Another interesting point for me is that artificial data (random) shows more or less the same trends for compression algorithms, but the real data is decompressed faster for the same compression ratio (lz4 level 17 for real data ~69 usec vs ~88 usec for random data). So, it’s important to benchmark on the real data samples if possible.

Play with charts here. Source code is on GitHub.