SIAJSON Schema

SIA for JSON Schema

Express SIA annotations using JSON Schema’s x- vendor extension mechanism. The format LLMs understand best — fine-tuned for function calling and structured output.

blogpost.sia.json
"title": {
  "type": "string",                      // native JSON Schema
  "maxLength": 200,                    // native validation

  "x-sia-role": "identifier",           // SIA: AI role
  "x-sia-priority": 1,                  // SIA: always keep
  "x-sia-instruction":                  // SIA: AI instruction
    "Main heading. Always include in any summary.",

  "x-maps-to": {                        // SIA: standard mappings
    "schema.org": "https://schema.org/headline",
    "Dublin Core": "http://purl.org/dc/elements/1.1/title"
  }
}
Syntaxx-sia-rolex-sia-priorityx-sia-instructionx-maps-toPlatform Ext.YAMLTokensNative JSON SchemaFull ExampleAdvanced

How SIA Works in JSON Schema

SIA uses the x- vendor extension mechanism defined by the OpenAPI Specification. This is the standard way to extend JSON Schema — every validator, linter, and code generator ignores x- properties by design.

LLMs have seen millions of OpenAPI specs with x- extensions. They read them as metadata without breaking.

SIA termJSON Schema propertyTypeApplies to
rolex-sia-rolestringProperty
priorityx-sia-priorityinteger 1–5Property
instructionx-sia-instructionstringProperty / Type
mapsTox-maps-toobjectProperty / Type
The rule. If JSON Schema already expresses a concept natively (type, required, $ref, enum, allOf, minLength, pattern, format), use the native keyword. Never create an x-sia-* version of something that already exists.

x-sia-role

Type: string  •  Applies to: property level

Tells the AI what semantic role this property plays. See SIA vocabulary for the complete list of standard role values.

"title": {
  "type": "string",
  "x-sia-role": "identifier"     // AI knows: this names the entity
}

"publishedDate": {
  "type": "string", "format": "date",
  "x-sia-role": "temporal"       // AI knows: this is a date/time
}

"author": {
  "$ref": "#/$defs/Author",
  "x-sia-role": "relationship"   // AI knows: links to another entity
}

x-sia-priority

Type: integer 1–5  •  Applies to: property level

When context is limited, tells the AI which properties to keep (1) and drop first (5). See SIA vocabulary for the full priority scale.

"title":      { "x-sia-priority": 1 }  // essential: always include
"body":       { "x-sia-priority": 2 }  // important: standard context
"summary":    { "x-sia-priority": 3 }  // useful: drop under pressure
"email":      { "x-sia-priority": 4 }  // supplementary: generous context only
"internalId": { "x-sia-priority": 5 }  // background: drop first

x-sia-instruction

Type: string (10–80 words)  •  Applies to: property or type level

Natural language instruction for the AI. The primary anti-hallucination mechanism.

Property-level

"author": {
  "$ref": "#/$defs/Author",
  "x-sia-instruction": "Link to Author type. In summaries, use the
    author's display name, never the raw ID or $ref path."
}

Type-level

{
  "title": "BlogPost",
  "type": "object",
  "x-sia-instruction": "A blog post. Title and author are always needed.
    Summary can be dropped in constrained contexts.",
  "properties": { ... }
}

x-sia-instruction vs description

description (native)x-sia-instruction (SIA)
AudienceHuman developersAI agents
ToneReference documentationOperational instructions
Example"The title of the blog post.""Main heading. Always include. Max 200 chars."
Both?AI reads x-sia-instruction first, falls back to description
Tip. If your description already reads like an AI instruction, skip x-sia-instruction. Use it only when human docs and AI instructions need to differ.

x-maps-to

Type: object  •  Applies to: property and type level

Declares equivalent properties or types in external standards. Keys are human-readable standard names, values are URIs.

"title": {
  "type": "string",
  "x-maps-to": {
    "schema.org": "https://schema.org/headline",
    "Dublin Core": "http://purl.org/dc/elements/1.1/title",
    "Open Graph": "og:title",
    "FHIR": "http://hl7.org/fhir/StructureDefinition/title"
  }
}

Platform Extensions

Any platform can define its own x-{name}-* namespace for round-trip metadata. AI agents ignore these.

"author": {
  "$ref": "#/$defs/Author",
  "x-sia-role": "relationship",
  // Platform-specific round-trip metadata
  "x-cm-relation": "domainIncludes",
  "x-cm-order": 4
}
When to include. Export for AI / external sharing → omit. Export for backup / round-trip import → include.

YAML Serialisation

SIA-annotated JSON Schema can be serialised as YAML — more compact and higher LLM comprehension for nested data in benchmarks.

$defs:
  BlogPost:
    type: object
    x-sia-instruction: >-
      A blog post. Title and author always needed.
    x-maps-to:
      schema.org: https://schema.org/BlogPosting
    properties:
      title:
        type: string
        maxLength: 200
        x-sia-role: identifier
        x-sia-priority: 1
        x-maps-to:
          schema.org: https://schema.org/headline
          Dublin Core: http://purl.org/dc/elements/1.1/title
      author:
        $ref: "#/$defs/Author"
        x-sia-role: relationship
        x-sia-priority: 2
        x-sia-instruction: >-
          Use author name in summaries, not ID.
    required: [title, body, author]
Recommendation. Use YAML for AI consumption via MCP tools (compact, better comprehension). Use JSON for code repositories and API tooling (universal compatibility).

Token Efficiency

x-sia-priority enables progressive truncation — strip lower-priority annotations to fit any token budget.

ModemaxPriorityWhat's includedUse case
Full5All properties, all annotationsDeep schema exploration
Normal3All properties, priority 1–3Standard AI context
Lean2All properties, priority 1–2Multiple schemas in context
Minimal1All properties, priority 1 onlyMax schemas, min tokens
Structure0Types and properties onlySchema overview

Minification rules

  1. Strip whitespace — saves ~15–20% tokens
  2. Remove description if x-sia-instruction is present
  3. Remove $schema and $id if consumer knows identity
  4. Remove x-maps-to if consumer doesn't need mappings
  5. Remove x-{platform}-* unless round-trip import is the goal
Need more compression? The ShEx expression of SIA achieves ~35% fewer tokens than JSON Schema for the same content.

What JSON Schema Already Gives You

Rule: if JSON Schema can express it natively, never use x-sia-* for it.

ConceptJSON Schema nativeSIA needed?
String property"type": "string"No
Required fields"required": [...]No
Relation to type"$ref": "#/$defs/..."No
Enum / taxonomy"enum": [...]No
Inheritance"allOf": [{"$ref":"..."}]No
Description"description": "..."No
Constraints"minimum", "maxLength", "pattern"No
Default values"default": ...No
Read-only"readOnly": trueNo
AI role hintYes: x-sia-role
AI priorityYes: x-sia-priority
AI instructionPartially (description)Optional: x-sia-instruction
Standard mappingYes: x-maps-to

Complete Example

Blog content model: two types, taxonomy, schema.org + Dublin Core mappings. More examples on GitHub ›

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Blog Content Model",
  "x-sia-instruction": "Blog content model. Posts and authors.",
  "x-maps-to": { "schema.org": "https://schema.org/Blog" },
  "$defs": {
    "BlogPost": {
      "type": "object",
      "x-sia-instruction": "A blog post. Title and author always needed.",
      "x-maps-to": {
        "schema.org": "https://schema.org/BlogPosting",
        "Dublin Core": "http://purl.org/dc/dcmitype/Text"
      },
      "properties": {
        "title": {
          "type": "string", "maxLength": 200,
          "x-sia-role": "identifier", "x-sia-priority": 1,
          "x-maps-to": {
            "schema.org": "https://schema.org/headline",
            "Dublin Core": "http://purl.org/dc/elements/1.1/title"
          }
        },
        "body": {
          "type": "string",
          "x-sia-role": "content", "x-sia-priority": 2,
          "x-maps-to": { "schema.org": "https://schema.org/articleBody" }
        },
        "author": {
          "$ref": "#/$defs/Author",
          "x-sia-role": "relationship", "x-sia-priority": 2,
          "x-sia-instruction": "Use author name in summaries, not ID.",
          "x-maps-to": {
            "schema.org": "https://schema.org/author",
            "Dublin Core": "http://purl.org/dc/elements/1.1/creator"
          }
        },
        "category": {
          "enum": ["tech", "science", "culture"],
          "x-sia-role": "classification", "x-sia-priority": 3,
          "x-maps-to": { "schema.org": "https://schema.org/articleSection" }
        }
      },
      "required": ["title", "body", "author"]
    },
    "Author": {
      "type": "object",
      "x-maps-to": { "schema.org": "https://schema.org/Person" },
      "properties": {
        "name": {
          "type": "string",
          "x-sia-role": "identifier", "x-sia-priority": 1,
          "x-maps-to": { "schema.org": "https://schema.org/name" }
        },
        "email": {
          "type": "string", "format": "email",
          "x-sia-role": "contact", "x-sia-priority": 4,
          "x-maps-to": { "schema.org": "https://schema.org/email" }
        }
      },
      "required": ["name"]
    }
  }
}

Advanced Patterns

Polymorphism

"contactMethod": {
  "oneOf": [
    { "type": "string", "format": "email" },
    { "$ref": "#/$defs/PhoneNumber" }
  ],
  "x-sia-role": "contact",
  "x-sia-instruction": "Email or phone. Prefer email for display."
}

Recursive schemas

"children": {
  "type": "array",
  "items": { "$ref": "#/$defs/Category" },
  "x-sia-instruction": "Recursive subcategories. Render as tree."
}

Conditional schemas

"if": { "properties": { "status": { "const": "published" } } },
"then": { "required": ["publishedDate", "author"] },
"x-sia-instruction": "Published posts require date and author."

Nullable

"middleName": {
  "type": ["string", "null"],
  "x-sia-priority": 4,
  "x-sia-instruction": "May be null. Never fabricate a middle name."
}
Never emit empty annotations. If a property's role can't be determined, omit x-sia-role. Placeholder values ("n/a", "TBD") waste tokens and confuse AI agents.