Snappy (previously known as Zippy) is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.

Snappy is widely used in Google projects like Bigtable, MapReduce and in compressing data for Google's internal RPC systems. It can be used in open-source projects like MariaDB ColumnStore, Cassandra, Couchbase, Hadoop, LevelDB, MongoDB, RocksDB, Lucene, Spark, Parquet, InfluxDB, and Ceph. Firefox uses Snappy to compress data in localStorage. Decompression is tested to detect any errors in the compressed stream. Snappy does not use inline assembler (except some optimizations) and is portable.

Stream format

Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream). The format uses no entropy encoder, like Huffman coding or arithmetic coding.

The first bytes of the stream are the length of uncompressed data, stored as a little-endian varint,

  • 00 – Literal – uncompressed data; upper 6 bits are used to store length (len-1) of data. Lengths larger than 60 are stored in a 1-4 byte integer indicated by a 6 bit length of 60 (1 byte) to 63 (4 bytes).
  • 01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;
  • 10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;
  • 11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;

The copy refers to the dictionary (just-decompressed data). The offset is the shift from the current position back to the already decompressed stream. The length is the number of bytes to copy from the dictionary. The size of the dictionary was limited by the 1.0 Snappy compressor to 32,768 bytes, and updated to 65,536 in version 1.1.

The complete official description of the snappy format can be found in the google GitHub repository.

Example of a compressed stream

The text

may be compressed to this, shown as hex data with explanations:

000000 51 f0 42 57 69 6b 69 70 65 64 69 61 20 69 73 20 >Q.BWikipedia is <

000010 61 20 66 72 65 65 2c 20 77 65 62 2d 62 61 73 65 >a free, web-base<

000020 64 2c 20 63 6f 6c 6c 61 62 6f 72 61 74 69 76 65 >d, collaborative<

000030 2c 20 6d 75 6c 74 69 6c 69 6e 67 75 61 6c 20 65 >, multilingual e<

The stream starts with the length of the uncompressed data as a varint, so the first byte, with the high bit clear, corresponds to a length of 51<sub>16</sub>=81 bytes. C#, Common Lisp, Crystal (programming language), Erlang, Go, Haskell, Lua, Java, Nim, Node.js, Perl, PHP, Python, R, Ruby, Rust, Smalltalk, and OpenCL.

A command-line interface program is also available.

See also

  • Zstandard

References

  • Snappy mailing list