Posted in Blog · Reading time ~10 min

Converting JSON to CSV without losing nested data

CSV is a flat, two-dimensional grid. JSON is a tree. Every JSON-to-CSV conversion has to make a choice about what to do with nested objects and arrays: flatten (dot-notation columns), JSON-encode (the whole sub-tree as a single cell), explode (one row per array element), or split (separate CSVs joined by a key). Each is the right answer in different cases; the wrong choice silently loses data. Here is when to pick which.

The JSON-to-CSV tool on this site defaults to JSON-encoding nested values (option 2 below) because it's the only strategy that never loses information and always produces RFC 4180-valid output. If you want a different strategy for a specific dataset, the patterns below show what the conversion code looks like in practice.

The sample data

One realistic record we'll keep coming back to:

{
  "id": 1001,
  "user": { "name": "Alice", "email": "[email protected]" },
  "tags": ["pro", "early-adopter"],
  "orders": [
    { "id": "A-1", "amount": 49.99 },
    { "id": "A-2", "amount": 12.50 }
  ],
  "active": true
}

The naive converter — "use top-level keys as columns, stringify everything else" — turns this into something like:

id,user,tags,orders,active
1001,[object Object],pro,early-adopter,[object Object],[object Object],true

Which is wrong about three things at once: the user object is stringified to [object Object], the tags array is unquoted and bleeds into adjacent columns, and the orders array has the same problem twice. You will recognize this output if you have ever used the JSON-to-CSV conversion built into a CMS.

Strategy 1: Flatten with dot notation

For nested objects (but not arrays), this is the cleanest answer. Every leaf gets its own column, with a path-style name:

id,user.name,user.email,active
1001,Alice,[email protected],true

The flatten function:

function flatten(obj, prefix = '', out = {}) {
  for (const [k, v] of Object.entries(obj)) {
    const key = prefix ? `${prefix}.${k}` : k;
    if (v && typeof v === 'object' && !Array.isArray(v)) {
      flatten(v, key, out);
    } else {
      out[key] = v;     // arrays handled separately
    }
  }
  return out;
}

This works beautifully when nested data is purely descriptive — addresses, name parts, settings objects. It breaks down on arrays because there is no good rule for the column name: tags.0, tags.1, tags.2 aren't comparable across rows (one row might have 2 tags, another 7).

Use it: when nested objects represent attributes of the row, not children of it.

Strategy 2: JSON-encode the nested cell

The safest, ugliest option. Keep the top-level keys as columns; for any value that isn't a primitive, emit JSON.stringify(value) as the cell content:

id,user,tags,orders,active
1001,"{""name"":""Alice"",""email"":""[email protected]""}","[""pro"",""early-adopter""]","[{""id"":""A-1"",""amount"":49.99},{""id"":""A-2"",""amount"":12.5}]",true

The double-quoting is RFC 4180-compliant — inside a quoted CSV field, a literal " is escaped as "". Excel and Google Sheets both read this correctly. The reader can round-trip back to JSON because the structure is preserved exactly.

The minimum-correct emitter:

function csvCell(v) {
  if (v === null || v === undefined) return '';
  const s = (typeof v === 'object') ? JSON.stringify(v) : String(v);
  return /[",\r\n]/.test(s) ? '"' + s.replace(/"/g, '""') + '"' : s;
}

That regex ([", ,, \r, \n) is the canonical "needs quoting" check from RFC 4180. Skip the \r and your output breaks on a producer that uses CRLF; skip the \n and your output breaks on multiline values.

Use it: when the nested data has to survive the round-trip — exports for re-import, dataset snapshots, debugging dumps.

Strategy 3: Explode (one row per array element)

When the array represents related entities rather than attributes, the right move is to repeat the parent on every child row:

id,user.name,order.id,order.amount,active
1001,Alice,A-1,49.99,true
1001,Alice,A-2,12.50,true

This is what a SQL outer join produces, which is why analytics tools love it. The data is now in third normal form (or close to it) and Excel pivot tables work. The tradeoff: the id and user values are repeated and the file gets bigger, but anyone who has used a spreadsheet knows what to do with it.

Exploding two arrays in the same record is where this gets tricky. If you also explode tags, do you get a row per (order, tag) pair (2 × 2 = 4 rows) or two parallel "phases"? There is no right answer, only an answer that matches your downstream tool. Most people pick one array to explode and JSON-encode the others.

Use it: when the consumer is a BI tool, an analyst with Excel, or a SQL COPY into a staging table.

Strategy 4: Split into multiple CSVs

If you're moving data into a relational database, the cleanest version of "explode" is "split". One CSV per entity, joined by foreign keys:

# users.csv
id,name,email,active
1001,Alice,[email protected],true

# orders.csv
user_id,id,amount
1001,A-1,49.99
1001,A-2,12.50

# tags.csv
user_id,tag
1001,pro
1001,early-adopter

This is more work to produce but it's how the data wants to live. The producer logic is more or less: walk the JSON, for every array field at depth N, write to a separate CSV that carries the IDs of every ancestor up to depth 0.

Use it: when the destination is a relational database, especially if the same JSON shape is going into the database every day (you're really building an ETL pipeline at that point).

The cross-cutting gotchas

Regardless of which strategy you pick, these come up.

Schema drift. Row 1 has 4 keys, row 7,491 has a 5th key you've never seen. A correct converter does two passes: first to collect every key that appears anywhere, second to emit rows with that union as the header. A single-pass converter that uses the first row's keys silently drops everyone else's extras. The JSON-to-CSV tool here does the two-pass version.

Big integers. Spreadsheet apps will truncate 16-digit IDs to 15 digits of precision and you will not know until a customer support ticket mentions a missing record. If your IDs are bigger than 253−1, write them prefixed with a ' or wrap them in quotes — Excel respects the quote, treats the cell as text, and stops "helping". The right long-term fix is documented in Why your JSON IDs are wrong.

Encoding. CSV should be UTF-8 with no BOM. Excel on Windows opens UTF-8 files as Windows-1252 by default and your accented characters turn to mojibake. Two workarounds: add a UTF-8 BOM () to the start of the file (Excel reads it, most other tools tolerate it), or name the file .tsv and use tabs (Excel handles UTF-8 in TSVs correctly). The BOM workaround is more common.

Booleans. JSON has true/false. CSV has no convention. Pick one — usually lowercase true/false — and document it. Excel will autocast TRUE/FALSE to a Boolean type that round-trips differently than text; if your downstream is Excel and you care, write "true" (quoted) instead.

Nulls. JSON has null. CSV traditionally uses an empty field. The two aren't equivalent — {"name": null} says "the name is unknown", {} says "the name was never set". After a round-trip through CSV you can't tell them apart. If that distinction matters to you, JSON-encode the value (strategy 2) so null survives as the literal text null.

The picker

The shortest decision tree I've found:

  • Is the consumer Excel or Google Sheets, and the nested data is "attributes"? → Flatten.
  • Is the consumer Excel or Google Sheets, and the nested data is "children"? → Explode one array, JSON-encode the rest.
  • Is the consumer a database loader? → Split.
  • Is the consumer a script that will read the CSV and turn it back into JSON, or you don't know? → JSON-encode.

If you're not sure, JSON-encode. It's verbose but it preserves everything. You can always pick a different strategy later when you know what the consumer actually wants.

If you have a JSON shape that doesn't fit any of these strategies cleanly, describe it — we collect these and may add a fifth.