Posted in Blog · Reading time ~11 min

When to ditch JSON for BSON, MessagePack, or Protobuf

JSON is still the correct default for HTTP APIs in 2026. The binary alternatives — BSON, MessagePack (msgpack), CBOR, Protobuf, FlatBuffers — only beat it once you have a concrete reason: precise typing, binary payloads, high throughput, or a schema you control on both sides. This post is a practical comparison: the switching points where each one starts to earn its complexity, and the parts of "JSON sucks here" that are usually a database problem in disguise.

If you're staying on JSON (probably you should), the formatting standards post and number precision post cover the gotchas that hit most teams before they reach the format-replacement stage.

The cast

JSON. RFC 8259. Text. Self-describing. Everything reads it. Slow only if you parse very large documents; expensive only if you transmit very many of them.

BSON. "Binary JSON." Designed for MongoDB. Adds binary, datetime, regex, and explicit int32/int64/decimal128 types. Wider than JSON in every record (length prefixes everywhere), so it isn't more compact in general — it's faster to skip over fields you don't want, which is why a database loves it.

MessagePack. Compact binary encoding of JSON-equivalent data + binary blobs and timestamps. About 20-40% smaller than equivalent JSON for typical payloads; significantly faster to parse. Schema-less, like JSON.

CBOR. RFC 8949. Spiritual sibling of MessagePack, IETF-blessed. Slightly more typed (tagged types). Used by COSE, WebAuthn, Matter, the IoT ecosystem in general.

Protobuf. Google's schema-first encoding. You define the schema (.proto), generate code, send compact wire bytes. Maximum compactness and speed; you can never know what's in a payload without the schema.

FlatBuffers / Cap'n Proto. Zero-copy schema-first formats. You access fields directly from the wire bytes without decoding into objects. Niche but glorious when you need it (game loops, mobile parsing on every frame).

Where JSON costs you

The honest list:

Binary data. JSON has no bytes type. You base64-encode, which is ~33% size overhead and a CPU pass on each side.
Number precision. No integers above 2⁵³-1, no decimal type, no explicit int32 vs int64. See the precision post.
Dates. RFC 3339 strings work but aren't free — every consumer parses them. No way to express "this is a duration" or "this is a calendar date with no time".
Parse speed. Parsing JSON is bounded by the speed of the character-by-character state machine. For very high QPS systems (millions of messages per second per node), it's where you start hitting real CPU.
Size on the wire. JSON is verbose. Field names are repeated on every record. For a million-record bulk export, the field name "customer_acquisition_channel" is 1.6 MB of overhead before you've sent any data.

If none of those is hurting you, stop reading and stay on JSON. Most of this article is for the people for whom one of those is hurting.

BSON: only if you're already MongoDB

BSON is not a general-purpose alternative to JSON. It exists because MongoDB needed a wire format that was easy for the storage engine to skip over fields in. The size on the wire is usually worse than JSON because of the length prefixes. The features that go beyond JSON (binary, datetime, decimal128, int64) are real and useful, but unless you're already in the MongoDB ecosystem the rest of the world will look at you funny when you send a BSON payload to a service that wasn't expecting one.

Use it: when interoperating with MongoDB drivers, or when the data is destined for or coming out of a MongoDB collection. Otherwise, skip.

MessagePack: the "JSON but smaller and faster" choice

If you have a service-to-service API in a polyglot environment and JSON's parse speed or size is genuinely a bottleneck, MessagePack is the lowest-friction switch. Same data model as JSON (objects, arrays, strings, numbers, booleans, null) plus binary and timestamps. Libraries exist for every language. No schema, no codegen.

Typical wins on real payloads: 20-40% smaller, 2-5x faster to parse than JSON. Not a 100x speedup — JSON parsers in 2026 (simdjson and friends) are already very fast — but a real one in the regime where parse time matters.

The friction: you can't curl | jq it. You lose the "open it in a text editor" debuggability of JSON. For a backend wire format that's a fair trade; for an external API it usually isn't.

Use it: high-volume internal RPC, queue messages where you control both sides, caches and stores where text JSON's overhead matters.

CBOR: when standards bodies are involved

CBOR is essentially MessagePack with a different bag of design choices and an RFC. Pick it over MessagePack if you're working with WebAuthn, COSE, IoT protocols, or anything else whose spec already references CBOR. Outside those domains the tooling is thinner; if you're choosing freely, MessagePack has more ecosystem.

Protobuf: when you have schema control

Protobuf is the right answer when both ends of the wire are services you control, you can run codegen, and you have enough volume that the marginal CPU and bytes per message add up. The savings are dramatic: 4-7x smaller than JSON for typed numeric data, 5-20x faster to parse, plus type safety in every language with a code generator.

The costs are also real:

A .proto file in source control that has to evolve carefully (field numbers are forever; add a field with a new number, never reuse, never change a type).
Generated code in every consumer's build pipeline.
You can't pretty-print a payload or grep production logs without help.
Forward and backward compatibility require discipline that's easy to lose in a hurry.

Use it: internal RPC at scale, especially via gRPC; mobile clients that benefit from small wire bytes; pipelines where exactly-typed schemas pay rent. Don't use it for external APIs unless you're committed to also publishing JSON encodings of every message.

FlatBuffers / Cap'n Proto: when the parse itself is the bottleneck

These are niche. The pitch: parsing is so cheap that "the parse" is reading a field from a wire buffer by offset, no object allocation. Game engines that read 10,000 messages per frame love them. Almost everyone else doesn't need them.

If you're considering FlatBuffers, you already know why. If you're considering FlatBuffers for an HTTP API, reconsider.

The actual decision tree

Most teams should ask, in this order:

Is parse time or wire size measurably hurting our system? (Profile first. The answer is usually "no".)
If yes, is "no" still the right answer because a database query, a missing index, or N+1 calls dominate the cost? (Often yes — the format isn't the bug.)
If genuinely a wire-format problem: do we control both ends? If no: stay on JSON (with maybe gzip or brotli). If yes: can we afford a .proto file and codegen? If yes: Protobuf. If no: MessagePack.

"Just gzip it"

Before adopting a new wire format, try compression. Brotli or gzip at moderate quality recovers most of JSON's size disadvantage versus MessagePack on typical payloads — field names compress beautifully because they repeat. Compression doesn't fix parse speed, but it solves "the wire is too big" alone in most cases, and every HTTP stack already supports it.

Concretely: a 5 KB JSON response is usually 1.2-1.8 KB gzipped, vs ~3-4 KB MessagePack. Bytes-on-the-wire is no longer JSON's primary problem in the gzip era.

The case for staying

JSON's quiet superpower: every system already understands it. Your monitoring, your logs, your queues, your editor, your colleague's homemade debugging script. Every binary format imposes a cost on debuggability and the human pipeline. That cost is real but invisible until you've left JSON and started missing it.

For 95% of the JSON traffic in the world, the right move is to stay on JSON, fix the actual bottlenecks (your database, your N+1s, your missing cache), and not collect a new wire format until you have a measurement that says it'll help.

If you're still on JSON, the practical day-to-day tools matter more than wire format theatre: the JSON formatter and beautifier for inspection, the JSON validator to catch broken payloads, and the JSON Schema validator to enforce the shape your service expects. If precision is your worry, see why your JSON IDs are wrong.

Migrating an existing API off JSON and want a sanity check on the plan? Drop us a line.