Posted in Blog · Reading time ~10 min

JSON Schema in 10 minutes — the keywords that matter

JSON Schema has roughly 50 keywords. On a real API, a dozen of them do 95% of the work: type, properties, required, additionalProperties, items, enum, const, minimum/maximum, pattern, format, $ref, and one of allOf/anyOf/oneOf. This post is the quickstart I wish I'd had when I started: each keyword with a real example, the keywords safe to skip, and the three traps that catch every team.

You can paste any of the schemas below into the JSON Schema Validator and try them live — it implements the practical subset of draft-07 and 2020-12 (the differences barely matter for the keywords below).

The shape of a schema

A JSON Schema is itself a JSON document. The simplest non-trivial one:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "name": { "type": "string" }
  },
  "required": ["name"]
}

That schema accepts {"name": "Alice"}, rejects {}, and — this surprises people — also accepts {"name": "Alice", "extra": 42}. By default JSON Schema is permissive. We'll come back to that.

type

The most basic constraint. Values: string, number, integer, boolean, object, array, null. You can also pass an array: "type": ["string", "null"] means "string or null", which is how you write nullable in JSON Schema (there is no nullable: true — that's an OpenAPI extension).

Caveat: integer is a subtype of number defined by mathematical value, not representation. 1.0 validates as an integer. 1.1 does not.

properties, required, additionalProperties

These three travel together. properties describes the keys you know about. required lists which of those must be present. additionalProperties describes everything else.

{
  "type": "object",
  "properties": {
    "id":   { "type": "integer" },
    "name": { "type": "string" }
  },
  "required": ["id"],
  "additionalProperties": false
}

Set additionalProperties: false on every object schema unless you have a specific reason not to. If you forget, a typo in the producer ("nmae" instead of "name") will pass validation, fail silently downstream, and you'll spend an afternoon finding it.

The exception is when you genuinely want a free-form bag — say, request headers — in which case use additionalProperties: { "type": "string" } to constrain the values without naming the keys.

items (and prefixItems)

For arrays. The common case:

{ "type": "array", "items": { "type": "string" } }

Every element is a string. If your array is more like a tuple — fixed positions with different types — use prefixItems (2020-12) or the old array form of items (draft-07):

{
  "type": "array",
  "prefixItems": [
    { "type": "string" },
    { "type": "integer" }
  ],
  "items": false
}

"A 2-tuple of [string, integer], nothing else allowed." The items: false closes the door at the end of the tuple.

enum and const

enum takes an array of allowed values. const is shorthand for a one-element enum.

{ "type": "string", "enum": ["red", "amber", "green"] }
{ "const": "v2" }

enum matches by deep equality — works for objects and arrays, not just primitives. Useful when you want to restrict to "one of these specific shapes".

minimum, maximum, minLength, maxLength, minItems, maxItems

Numeric and length bounds. Boring, useful. The number ones come in inclusive (minimum) and exclusive (exclusiveMinimum) flavors. Lengths are inclusive.

If you're validating user input from a form, set maxLength on every string. Without it, someone will eventually paste a 50MB string into your email field and the validator will happily hold it in memory while it processes regex against it. (More on that in JSON security pitfalls.)

pattern and format

pattern is a regex applied to a string. JSON Schema specifies ECMA-262 regex, which is what JavaScript engines run; most non-JS validators implement a close subset.

format is the interesting one. It's a hint: "format": "email" means "this string should look like an email address". Whether the validator actually checks it is implementation-defined — by default most validators do not enforce format. You have to opt in (Ajv: addFormats; Python jsonschema: pass a format_checker). Useful formats in 2020-12 that are widely supported:

  • date, time, date-time — RFC 3339 subset
  • email, idn-email
  • uri, uri-reference
  • uuid
  • regex — meta-validation: "this string must be a valid ECMA-262 regex"

$ref and $defs

The way you reuse a sub-schema. Define once in $defs, reference with $ref:

{
  "type": "object",
  "properties": {
    "shipping": { "$ref": "#/$defs/address" },
    "billing":  { "$ref": "#/$defs/address" }
  },
  "$defs": {
    "address": {
      "type": "object",
      "required": ["street", "city"],
      "properties": {
        "street": { "type": "string" },
        "city":   { "type": "string" }
      }
    }
  }
}

$ref can point to a fragment of the same document (above), another file ("$ref": "./common.json#/$defs/address"), or a URL. The catch: most validators need to be told where local files live (a "base URI" or "schema store"), and resolving remote URLs at validation time is usually disabled for safety — pre-bundle remote refs into one document instead.

allOf, anyOf, oneOf

The combinators. Most schemas don't need them; the ones that do can't live without them.

  • allOf — must validate against every subschema. Use it to extend: "an Address, plus these extra fields".
  • anyOf — must validate against at least one. Use it for unions: "string or number".
  • oneOf — must validate against exactly one. Use it for tagged unions when the tags are mutually exclusive.

Trap: oneOf error messages are notoriously bad. If a value fails all branches, the validator reports a failure against each — three errors, none of them the "real" one. For tagged unions, if/then/else with the tag as the condition produces much better errors:

{
  "type": "object",
  "required": ["kind"],
  "properties": { "kind": { "enum": ["circle", "square"] } },
  "allOf": [
    {
      "if":   { "properties": { "kind": { "const": "circle" } } },
      "then": { "required": ["radius"], "properties": { "radius": { "type": "number" } } }
    },
    {
      "if":   { "properties": { "kind": { "const": "square" } } },
      "then": { "required": ["side"],   "properties": { "side":   { "type": "number" } } }
    }
  ]
}

Keywords safe to ignore (at first)

You can build production APIs without ever using these:

  • dependentRequired, dependentSchemas — niche.
  • contains, minContains, maxContains — "at least one array element matches".
  • patternProperties — match keys by regex. Useful, but rare.
  • contentEncoding, contentMediaType — almost never enforced.
  • unevaluatedProperties, unevaluatedItems — interact with allOf; learn when you hit the limit of additionalProperties.

The three traps

Trap 1: missing additionalProperties: false. Already mentioned. By far the most expensive omission.

Trap 2: format not enforced. You wrote "format": "email", your tests passed because your test data was valid, and production accepts "not an email". Configure your validator to enforce formats, and write a test that checks an obviously invalid value is rejected.

Trap 3: required doesn't imply non-null. "required": ["name"] means the key must be present, not that the value must be non-null. If null is not acceptable, the property schema must say so: { "type": "string" } (without "null" in the type array).

What to do next

Write a schema for a piece of your API you care about — request body, config file, whatever. Paste it and a sample document into the JSON Schema Validator and tweak until the validator says exactly what you mean. Then break the document on purpose and confirm the validator complains about the thing you broke, in the words you'd want.

That last step is the one most people skip and is also the one that pays off forever after: your schema isn't done when it accepts the happy path; it's done when it rejects the sad path with a message a human can act on.

Got a schema that misbehaves and you can't figure out why? Send it over — we may turn it into a follow-up post.