Choosing Serialization Formats in Go

Choosing Serialization Formats in Go

Daniel Sollis & Danielle Maxwell | Monday, Dec 5, 2022 | 

Serialization format is an often-overlooked engineering step that could make your application as slow as a turtle or as speedy as a cheetah!

Serialization probably isn’t something that most developers spend too much brain power on when designing an application, but there’s more to sending bits over a wire than you may think and it deserves some thought. With most serialization formats claiming to be fast and lightweight solutions for transmitting data, choosing the optimal one can be difficult. In this post we’ll go over a few important serialization formats along with their pros and cons to help you determine the best choice for your situation.

JSON

JavaScript Object Notation is a text-based format based on JavaScript’s object syntax. Essentially, JSON is a string. One of the reasons many prefer to use it is that it’s human readable.

{"name":"Grinch", "age":42, "loc":"Mount Crumpit", "pet":true}

BSON

BSON Stands for “Binary JSON” and is the format used by MongoDB. BSON is designed to be space efficient, but because BSON encodes extra information like object length in documents, it might be less efficient than JSON in some cases. BSON does this in order to enhance its traversability. BSON is also designed to be transparently converted to/from JSON.

{"bah": "humbug"} →
\x16\x00\x00\x00            // total document size
\x02                        // 0x02 = type String
bah\x00                     // field name
\x06\x00\x00\x00humbug\x00  // field value
\x00                        // 0x00 = type EOO ('end of object')

BSON has some special types like “ObjectId”, “Min key”, and “Max key" which are not compatible with JSON. This incompatibility could lead to some type information getting lost when converting objects from BSON to JSON.

MessagePack

MessagePack is used with SignalR, an open source ASP.NET competitor to gRPC, and like protobufs, is specified using a schema. Ironically, MessagePack is more compatible with JSON than BSON, so:

{"name":"Grinch", "age":42, "loc":"Mount Crumpit", "pet":true}

becomes…

DF 00 00 00 04 A4 6E 61 6D 65 A6 47 72 69 6E 63 68 A3 61 67
65 2A A3 6C 6F 63 AD 4D 6F 75 6E 74 20 43 72 75 6D 70 69 74
A3 70 65 74 C3

It also supports static-typing and type-checking, which is good news if you are working in a strongly-typed language.

Protocol Buffers

Protocol Buffers or protobufs are serialization formats used with the Google designed gRPC Remote Procedure Call framework. They are specified with a generic schema that can generate code to be used with a variety of different languages. As mentioned in one of our previous blog posts, protobufs produce extremely compact messages by using a binary format.

syntax = "proto3";

package santa.v1;

import "google/protobuf/timestamp.proto";

message Sleigh {
    uint64 id = 1;
    bytes naughty_list = 2;
    bool lively_and_quick = 3;
    repeated string reindeer = 4;
    google.protobuf.Timestamp departure = 5;
}

Serialization Bakeoff

Because everyone loves benchmarks, we decided to put each of the serialization formats mentioned above to the test using Golang.

We created three different data structs: small (5 items), medium (20 items), and large (45 items). We were sure to use a range of data types (strings, integers, booleans, and time). For the protocol buffers, we used strings to represent time in this experiment instead of using Timestamp. Finally, we compared JSON, BSON, and MessagePack both in their raw and gzipped form.

Now, let’s take a look at the results!

JSON Results

StructSize (Bytes)Time to Execute
Small JSON116 B582.542µs
Medium JSON563 B54µs
Large JSON1.258 kB74.458µs
Small gzipped JSON129 B200.75µs
Medium gzipped JSON292 B218.916µs
Large gzipped JSON500 B230.041µs

BSON Results

StructSize (Bytes)Time to Execute
Small BSON88 B100.459µs
Medium BSON398 B21.792µs
Large BSON912 B46.791µs
Small gzipped BSON103 B584.917µs
Medium gzipped BSON264 B1.127125ms
Large gzipped BSON477 B325.875µs

Package used: Golang driver for MongoDB

MessagePack Results

StructSize (Bytes)Time to Execute
Small msgpack70 B88.875µs
Medium msgpack356 B11.25µs
Large msgpack819 B25µs
Small gzipped msgpack90 B1.896458ms
Medium gzipped msgpack275 B304.708µs
Large gzipped msgpack501 B107.042µs

Package used: MessagePack encoding for Golang

Protocol Buffer Results

StructSize (Bytes)Time to Execute
Small28 B596µs
Medium140 B136.75µs
Large279 B130.542µs

Package used: Protobuf

Takeaways

One thing we noticed is that for really small data, it’s probably not worth it to compress with gzip, regardless of serialization format. Another thing to note is that for the medium and large JSON, using gzip to compress the data helped significantly reduce the size. The binary serialization formats all performed better than JSON.

Although MessagePack is quick, it became apparent that as the amount of data increased, its size was on par with or worse than gzipped JSON and BSON. We did use TinyLib MessagePack Code Generator to test a generated version of MessagePack, but saw no significant difference in the results. However, it was nice to have the test files automatically created.

Because size matters as much as speed, protobufs are currently the best solution for our serialization needs. The size for protobufs remained consistent no matter how many times we ran our test. With the other serialization formats, the size would occasionally change by a miniscule amount. Of course, the downside of using protocol buffers is that the message has to be defined upfront, and if the API changes it can be cumbersome.

As protocol buffers and the Go language were developed by Google this likely played a big role in why protobufs performed the best during our tests. Test results will vary depending on the programming language used. In fact, if using a higher-level language, JSON may be the better choice.

We’d like to thank Alec Thomas for creating a GitHub repo that made it easier for us to look into different Go packages for this post. If you’re interested in learning more about other serialization formats that may be used with Go, we highly recommend checking out their repo.

Resources

BSON vs JSON on Stack Overflow

BSON vs MessagePack Stack Overflow

MessagePack Wikipedia page

JSON MDN Article

About This Post

In this post we’ll go over a few important serialization formats along with their pros and cons to help you determine the best choice for your situation.

Written by:

Daniel Sollis & Danielle Maxwell

Share this post:

Recent Rotations butterfly

View all

How to Dockerize Python Data Science Processes

Docker is great, but most tutorials are geared toward devOps users, not data scientists. If you’re building long-running processes for NLP, ML, or generative AI, here’s a blueprint for Python Docker containers for data science!

Benjamin Bengfort
Sep 14, 2023

My First Year as a Junior Developer at Rotational Labs

🎉 I just celebrated my first year as a junior developer. In this post, I’ll share some of the things I’ve learned and techniques I’ve grown to value over the last twelve months.

Danielle Maxwell
Sep 1, 2023

Intro to Polars: A Pandas Alternative for Efficiently Working with Large Datasets

Dataframes are a powerful data structure for data processing, analytics, and ML. For many years, Pandas has been my go-to. But it can really slow you down when you’re working with big or high-dimensional data. Enter Polars!

Prema Roman
Aug 22, 2023
Enter Your Email To Subscribe