SIA for JSON Schema — Schema Intelligence Annotations

How SIA Works in JSON Schema

SIA uses the x- vendor extension mechanism defined by the OpenAPI Specification. This is the standard way to extend JSON Schema — every validator, linter, and code generator ignores x- properties by design.

LLMs have seen millions of OpenAPI specs with x- extensions. They read them as metadata without breaking.

SIA term	JSON Schema property	Type	Applies to
role	`x-sia-role`	string	Property
priority	`x-sia-priority`	integer 1–5	Property
instruction	`x-sia-instruction`	string	Property / Type
mapsTo	`x-maps-to`	object	Property / Type

The rule. If JSON Schema already expresses a concept natively (type, required, $ref, enum, allOf, minLength, pattern, format), use the native keyword. Never create an x-sia-* version of something that already exists.

x-sia-role

Type: string • Applies to: property level

Tells the AI what semantic role this property plays. See SIA vocabulary for the complete list of standard role values.

"title": {
  "type": "string",
  "x-sia-role": "identifier"     // AI knows: this names the entity
}

"publishedDate": {
  "type": "string", "format": "date",
  "x-sia-role": "temporal"       // AI knows: this is a date/time
}

"author": {
  "$ref": "#/$defs/Author",
  "x-sia-role": "relationship"   // AI knows: links to another entity
}

x-sia-priority

Type: integer 1–5 • Applies to: property level

When context is limited, tells the AI which properties to keep (1) and drop first (5). See SIA vocabulary for the full priority scale.

"title":      { "x-sia-priority": 1 }  // essential: always include
"body":       { "x-sia-priority": 2 }  // important: standard context
"summary":    { "x-sia-priority": 3 }  // useful: drop under pressure
"email":      { "x-sia-priority": 4 }  // supplementary: generous context only
"internalId": { "x-sia-priority": 5 }  // background: drop first

x-sia-instruction

Type: string (10–80 words) • Applies to: property or type level

Natural language instruction for the AI. The primary anti-hallucination mechanism.

Property-level

"author": {
  "$ref": "#/$defs/Author",
  "x-sia-instruction": "Link to Author type. In summaries, use the
    author's display name, never the raw ID or $ref path."
}

Type-level

{
  "title": "BlogPost",
  "type": "object",
  "x-sia-instruction": "A blog post. Title and author are always needed.
    Summary can be dropped in constrained contexts.",
  "properties": { ... }
}

x-sia-instruction vs description

	`description` (native)	`x-sia-instruction` (SIA)
Audience	Human developers	AI agents
Tone	Reference documentation	Operational instructions
Example	"The title of the blog post."	"Main heading. Always include. Max 200 chars."
Both?	AI reads `x-sia-instruction` first, falls back to `description`

Tip. If your description already reads like an AI instruction, skip x-sia-instruction. Use it only when human docs and AI instructions need to differ.

x-maps-to

Type: object • Applies to: property and type level

Declares equivalent properties or types in external standards. Keys are human-readable standard names, values are URIs.

"title": {
  "type": "string",
  "x-maps-to": {
    "schema.org": "https://schema.org/headline",
    "Dublin Core": "http://purl.org/dc/elements/1.1/title",
    "Open Graph": "og:title",
    "FHIR": "http://hl7.org/fhir/StructureDefinition/title"
  }
}

Keys are human-readable: "schema.org", "Dublin Core", "FHIR", "Open Graph"
Values are full URIs or prefixed names for well-known vocabularies
Multiple mappings per property are supported
Omit standards with no equivalent — never add placeholders

Platform Extensions

Any platform can define its own x-{name}-* namespace for round-trip metadata. AI agents ignore these.

"author": {
  "$ref": "#/$defs/Author",
  "x-sia-role": "relationship",
  // Platform-specific round-trip metadata
  "x-cm-relation": "domainIncludes",
  "x-cm-order": 4
}

When to include. Export for AI / external sharing → omit. Export for backup / round-trip import → include.

YAML Serialisation

SIA-annotated JSON Schema can be serialised as YAML — more compact and higher LLM comprehension for nested data in benchmarks.

$defs:
  BlogPost:
    type: object
    x-sia-instruction: >-
      A blog post. Title and author always needed.
    x-maps-to:
      schema.org: https://schema.org/BlogPosting
    properties:
      title:
        type: string
        maxLength: 200
        x-sia-role: identifier
        x-sia-priority: 1
        x-maps-to:
          schema.org: https://schema.org/headline
          Dublin Core: http://purl.org/dc/elements/1.1/title
      author:
        $ref: "#/$defs/Author"
        x-sia-role: relationship
        x-sia-priority: 2
        x-sia-instruction: >-
          Use author name in summaries, not ID.
    required: [title, body, author]

Recommendation. Use YAML for AI consumption via MCP tools (compact, better comprehension). Use JSON for code repositories and API tooling (universal compatibility).

Token Efficiency

x-sia-priority enables progressive truncation — strip lower-priority annotations to fit any token budget.

Mode	maxPriority	What's included	Use case
Full	5	All properties, all annotations	Deep schema exploration
Normal	3	All properties, priority 1–3	Standard AI context
Lean	2	All properties, priority 1–2	Multiple schemas in context
Minimal	1	All properties, priority 1 only	Max schemas, min tokens
Structure	0	Types and properties only	Schema overview

Minification rules

Strip whitespace — saves ~15–20% tokens
Remove description if x-sia-instruction is present
Remove $schema and $id if consumer knows identity
Remove x-maps-to if consumer doesn't need mappings
Remove x-{platform}-* unless round-trip import is the goal

Need more compression? The ShEx expression of SIA achieves ~35% fewer tokens than JSON Schema for the same content.

What JSON Schema Already Gives You

Rule: if JSON Schema can express it natively, never use x-sia-* for it.

Concept	JSON Schema native	SIA needed?
String property	`"type": "string"`	No
Required fields	`"required": [...]`	No
Relation to type	`"$ref": "#/$defs/..."`	No
Enum / taxonomy	`"enum": [...]`	No
Inheritance	`"allOf": [{"$ref":"..."}]`	No
Description	`"description": "..."`	No
Constraints	`"minimum"`, `"maxLength"`, `"pattern"`	No
Default values	`"default": ...`	No
Read-only	`"readOnly": true`	No
AI role hint	—	Yes: `x-sia-role`
AI priority	—	Yes: `x-sia-priority`
AI instruction	Partially (`description`)	Optional: `x-sia-instruction`
Standard mapping	—	Yes: `x-maps-to`

Complete Example

Blog content model: two types, taxonomy, schema.org + Dublin Core mappings. More examples on GitHub ›

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Blog Content Model",
  "x-sia-instruction": "Blog content model. Posts and authors.",
  "x-maps-to": { "schema.org": "https://schema.org/Blog" },
  "$defs": {
    "BlogPost": {
      "type": "object",
      "x-sia-instruction": "A blog post. Title and author always needed.",
      "x-maps-to": {
        "schema.org": "https://schema.org/BlogPosting",
        "Dublin Core": "http://purl.org/dc/dcmitype/Text"
      },
      "properties": {
        "title": {
          "type": "string", "maxLength": 200,
          "x-sia-role": "identifier", "x-sia-priority": 1,
          "x-maps-to": {
            "schema.org": "https://schema.org/headline",
            "Dublin Core": "http://purl.org/dc/elements/1.1/title"
          }
        },
        "body": {
          "type": "string",
          "x-sia-role": "content", "x-sia-priority": 2,
          "x-maps-to": { "schema.org": "https://schema.org/articleBody" }
        },
        "author": {
          "$ref": "#/$defs/Author",
          "x-sia-role": "relationship", "x-sia-priority": 2,
          "x-sia-instruction": "Use author name in summaries, not ID.",
          "x-maps-to": {
            "schema.org": "https://schema.org/author",
            "Dublin Core": "http://purl.org/dc/elements/1.1/creator"
          }
        },
        "category": {
          "enum": ["tech", "science", "culture"],
          "x-sia-role": "classification", "x-sia-priority": 3,
          "x-maps-to": { "schema.org": "https://schema.org/articleSection" }
        }
      },
      "required": ["title", "body", "author"]
    },
    "Author": {
      "type": "object",
      "x-maps-to": { "schema.org": "https://schema.org/Person" },
      "properties": {
        "name": {
          "type": "string",
          "x-sia-role": "identifier", "x-sia-priority": 1,
          "x-maps-to": { "schema.org": "https://schema.org/name" }
        },
        "email": {
          "type": "string", "format": "email",
          "x-sia-role": "contact", "x-sia-priority": 4,
          "x-maps-to": { "schema.org": "https://schema.org/email" }
        }
      },
      "required": ["name"]
    }
  }
}

Advanced Patterns

Polymorphism

"contactMethod": {
  "oneOf": [
    { "type": "string", "format": "email" },
    { "$ref": "#/$defs/PhoneNumber" }
  ],
  "x-sia-role": "contact",
  "x-sia-instruction": "Email or phone. Prefer email for display."
}

Recursive schemas

"children": {
  "type": "array",
  "items": { "$ref": "#/$defs/Category" },
  "x-sia-instruction": "Recursive subcategories. Render as tree."
}

Conditional schemas

"if": { "properties": { "status": { "const": "published" } } },
"then": { "required": ["publishedDate", "author"] },
"x-sia-instruction": "Published posts require date and author."

Nullable

"middleName": {
  "type": ["string", "null"],
  "x-sia-priority": 4,
  "x-sia-instruction": "May be null. Never fabricate a middle name."
}

Never emit empty annotations. If a property's role can't be determined, omit x-sia-role. Placeholder values ("n/a", "TBD") waste tokens and confuse AI agents.