Posted in Blog · Reading time ~8 min

JSON Lines, NDJSON, and how to handle streaming JSON

JSON Lines (.jsonl) and NDJSON (.ndjson) describe the same file format: one JSON value per line, separated by \n. It is the de facto standard for log files, ML training data, BigQuery exports, and any stream of records you want to process one at a time. This post covers the format precisely, when to pick it over a single big array, how to parse it without loading the whole file, and the six small pitfalls that bite people.

If you have a chunk to inspect right now, you can drop it into the JSON Validator a line at a time, or use the JSON-to-CSV converter after combining the lines into an array.

The two specs are the same spec

Both jsonlines.org and ndjson-spec say: one JSON value per line, lines separated by \n, UTF-8, no trailing comma, no enclosing brackets. NDJSON adds two trivial nuances:

  • Empty lines are ignored.
  • The MIME type is application/x-ndjson (vs application/json-lines or application/jsonl, neither of which is registered).

In practice these are interchangeable. Pick a file extension your team already uses and move on. The "JSON Streams" / "JSON Text Sequences" RFC (RFC 7464) is different — it uses 0x1E record separators instead of newlines and is mostly used by Prometheus and almost nobody else.

What it looks like

{"id":1,"user":"alice","event":"login"}
{"id":2,"user":"bob","event":"page_view","path":"/"}
{"id":3,"user":"alice","event":"logout"}

Three records, three lines. Each line is independently valid JSON. The file as a whole is not valid JSON — paste it into JSON.parse and it will choke at the start of line 2.

Why bother — vs one big array?

Three real reasons.

1. You can stream-parse it. With [ {...}, {...}, {...} ] a strict parser has to see the closing ] before it can hand you the first record. With JSONL, every \n is a record boundary; you can parse one record, process it, free its memory, and move to the next. This matters when "next" means "row 4 million of a 12GB BigQuery export".

2. It's append-only. Adding a record is "open in append mode, write the line, close". With an array you'd have to seek to the closing ], rewrite it as ,, write the new record, write a new ] — and pray no other process is doing the same thing.

3. Partial files are still useful. If a writer crashes mid-stream, you lose at most the last (partial) line. With an array, a crash leaves the file with no closing ] and no parser will touch it.

Writing it

The smallest correct producer in JavaScript:

import { createWriteStream } from 'node:fs';
const out = createWriteStream('events.jsonl');

for (const event of stream) {
  out.write(JSON.stringify(event) + '\n');
}
out.end();

The catch is that JSON.stringify can produce strings containing literal newlines if you don't escape them — actually no, it can't. JSON.stringify always escapes \n inside strings. But hand-rolled producers, or producers that pretty-print each line and join them, absolutely can. Rule: each JSON value must be on exactly one line. If you ever feel the urge to pretty-print, you're producing the wrong format.

Reading it correctly

The naive approach — fs.readFileSync, .split('\n'), JSON.parse each — works for small files. For anything that might be big, stream it. In Node:

import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';

const rl = createInterface({
  input: createReadStream('events.jsonl', { encoding: 'utf8' }),
  crlfDelay: Infinity,         // treat \r\n as one line break
});

for await (const line of rl) {
  if (!line) continue;          // NDJSON allows empty lines
  const event = JSON.parse(line);
  await handle(event);
}

In Python the canonical loop is even shorter:

import json
with open('events.jsonl', 'r', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        record = json.loads(line)
        handle(record)

Reading over HTTP — chunked

This is where streaming earns its keep. With a big array, you can't start work until the response is complete. With JSONL, you can split chunks on \n and yield records as the bytes arrive. The browser-side pattern:

const res = await fetch('/api/events?stream=1');
const reader = res.body
  .pipeThrough(new TextDecoderStream())
  .getReader();

let buf = '';
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += value;
  let nl;
  while ((nl = buf.indexOf('\n')) !== -1) {
    const line = buf.slice(0, nl).trim();
    buf = buf.slice(nl + 1);
    if (line) handle(JSON.parse(line));
  }
}
// flush any final line without a trailing newline
if (buf.trim()) handle(JSON.parse(buf));

That buffer-and-split pattern is the heart of every correct NDJSON reader. The thing that goes wrong in beginner versions: they call JSON.parse on each chunk directly. A chunk boundary almost never lines up with a record boundary, so the parser fails on the partial line.

Six pitfalls

1. Newlines inside string values. JSON strings can't contain literal newlines, so this is only a concern if a producer is generating JSON by string-concatenation. If you see SyntaxError at a position that's clearly inside a quoted string, your producer is the bug.

2. \r\n on Windows. Split on \n and .trim() the result. Don't split on \r\n exclusively — readers on Unix won't see it.

3. UTF-8 BOM at the start. Some Windows tools write 0xEF 0xBB 0xBF on the first line. JSON.parse rejects it. Strip it once before parsing line 1.

4. Trailing newline / no trailing newline. Don't assume either way. The reader above handles both: the inner while loop processes complete lines; the final if (buf.trim()) picks up a last record that wasn't terminated.

5. Mixing JSONL and JSON in the same endpoint. A handler that returns one big array on small queries and JSONL on large ones is a special kind of evil. Pick one per endpoint and put the format in the URL or the Accept header. JSONL responses should set Content-Type: application/x-ndjson.

6. Big numbers (again). Same as regular JSON: integers above 253−1 lose precision when parsed into JavaScript numbers. If your producer is logging Snowflake IDs, transmit them as strings. See Why your JSON IDs are wrong: the BigInt precision problem for the long version.

Converting JSONL to an array (and back)

One-liners that come up often:

# jsonl -> array (jq)
jq -s '.' events.jsonl > events.json

# array -> jsonl (jq)
jq -c '.[]' events.json > events.jsonl

The -c flag is the important bit: it tells jq to emit each value on one line, which is the JSONL contract. The -s ("slurp") flag in the first one tells jq to read all inputs and aggregate them into a single array.

When to use it

Use JSONL for: logs, ML training and eval data, BigQuery / Snowflake / Redshift exports, anything append-only, anything you want to stream over HTTP, anything where the file may be larger than memory.

Don't use it for: API responses to a browser that needs the whole thing before rendering (just send the array); configuration files (humans want pretty-printed JSON, YAML, or TOML); anything a non-developer will open in Excel.

For one-off inspection: extract a single JSONL record and paste it into the JSON formatter and viewer to pretty-print, or the JSON validator to confirm each line is well-formed JSON. To flatten an NDJSON stream into a CSV for spreadsheets, gather the lines into an array first, then feed that into the JSON to CSV converter.

If you're wrestling with a specific NDJSON producer or reader and want a second pair of eyes, the contact page is open.