SIA — Schema Intelligence Annotations

What is SIA

Schema Intelligence Annotations (SIA) is a small, focused vocabulary that makes schemas understandable for AI agents, interoperable across standards, and portable across platforms.

SIA is not a schema language. It is a set of four annotation terms that can be expressed in any schema format. The same SIA vocabulary works in JSON Schema, ShEx, and any future format — because the annotations describe meaning, not syntax.

What SIA adds

AI operational metadata — tells AI agents what each property means, how important it is, and how to use it. Prevents hallucination. Enables intelligent truncation under token pressure.
Cross-standard mappings — declares how each property maps to schema.org, Dublin Core, FHIR, or any external vocabulary. One schema, multiple standards, no separate mapping files.

What SIA does NOT change

Validation, structure, data types, required fields, references, inheritance, enums — all handled by the base format (JSON Schema, ShEx, or whatever you use). SIA only adds what schema languages don't have: intelligence for AI and interoperability across standards.

The SIA Vocabulary

Four terms. That's the entire specification. Each is self-describing, optional, and format-independent.

role

string · property level

Tells the AI what semantic role this property plays when composing content. Is it an identifier? A relationship? A date? A status flag?

role: "identifier" // AI knows: this names the entity

priority

integer 1–5 · property level

When context is limited, tells the AI which properties to keep (1 = always) and which to drop first (5 = expendable).

priority: 3 // AI knows: safe to drop if tight

instruction

string · property or type level

Natural language instruction for the AI. The primary anti-hallucination mechanism. Tells the AI how to use this property correctly.

instruction: "Use author name in
summaries, never the raw ID."

mapsTo

URI(s) · property or type level

Declares equivalent properties in external standards. One property can map to schema.org, Dublin Core, FHIR — as many as needed.

mapsTo: schema.org/headline
mapsTo: dc:title

Standard role values

Role	Meaning	Typical properties
`identifier`	Primary name, ID, title, or label	name, title, slug, sku
`descriptive`	Summary, abstract, explanation	summary, bio, description
`content`	Main body, payload, primary data	body, text, html
`relationship`	Reference to another entity	author, parent, category
`classification`	Category, tag, type grouping	category, tags, enum fields
`temporal`	Date, time, timestamp, duration	createdAt, publishedDate
`contact`	Email, phone, address	email, phone, address
`status`	State, flag, lifecycle indicator	status, active, published
`metric`	Numeric measurement, score	price, rating, count
`governance`	Audit, ownership, permission	createdBy, accessLevel

Extensible. Domain-specific roles can be added (e.g., clinical, financial, geospatial). The vocabulary is designed to grow through community contributions.

Priority scale

Value	Label	Meaning
1	Essential	Always include, even in one-line summaries
2	Important	Include in standard context
3	Useful	Include in full display, drop under pressure
4	Supplementary	Include only with generous token budget
5	Background	Drop first, rarely needed by AI

Architecture

A SIA-annotated schema has four layers. Only Layer 1 is required. The rest are optional and additive.

Layer 1Base format
Structure & validation
Layer 2SIA core
role, priority, instruction
Layer 3Mappings
mapsTo URIs
Layer 4Platform
Round-trip extensions

Layer 1 is whichever schema format you use — JSON Schema, ShEx, or another. It handles structure, types, validation, and all the things schema languages already do well.

Layers 2–3 are the SIA vocabulary. They add AI intelligence and cross-standard mappings using each format's native extension mechanism.

Layer 4 is for any platform to define its own namespace for round-trip metadata. A data modeling tool might use x-cm-*, a CMS might use x-sanity-*, an ETL system might use x-dbt-*. Layer 4 is deliberately open and platform-specific.

Key principle. If the base format already expresses something natively ($ref for relations, enum for taxonomies, required for mandatory fields, cardinality in ShEx), SIA never duplicates it. SIA only adds what schema languages don't have.

Available Formats

SIA is expressed in multiple carrier formats. Each uses the base format's native extension mechanism. The SIA vocabulary is identical across all formats — only the syntax differs.

PRIMARY

JSON Schema + SIA

Standard JSON Schema with x-sia-* vendor extensions. Best LLM comprehension — the format AI agents are fine-tuned on for function calling. Universal tooling support.

View full specification

COMPACT

ShEx + SIA

Standard ShExC with // sia:* annotations. 35% more compact. Native semantic web interoperability. Ideal for token-constrained AI operations and linked data workflows.

View full specification

COMING SOON

More formats

SIA is designed to be format-independent. Future expressions may include YAML-native, Protobuf, or GraphQL SDL — wherever schemas live, SIA can follow.

Contribute a format

How SIA maps across formats

SIA term	JSON Schema	ShEx	Mechanism
role	`x-sia-role`	`// sia:role`	Vendor extension / RDF annotation
priority	`x-sia-priority`	`// sia:priority`	Vendor extension / RDF annotation
instruction	`x-sia-instruction`	`// sia:instruction`	Vendor extension / RDF annotation
mapsTo	`x-maps-to` object	`// sia:mapsTo` URI	Vendor extension / RDF annotation
platform ext.	`x-{name}-*`	`// {ns}:*`	Open namespace pattern

Design Principles

Never reinvent the screw. If the base format already expresses a concept, use it natively. SIA never duplicates what JSON Schema, ShEx, or any base format already does.
Self-describing names. An AI that has never seen SIA should infer the meaning from the term name alone. role and instruction are clear. Abbreviations and codes are not.
Graceful ignorance. Any tool for the base format must process a SIA-annotated schema without error. Standard validators ignore extensions they don't understand.
Layered enrichment. A plain schema is valid SIA. Adding role/priority/instruction makes it AI-intelligent. Adding mapsTo makes it interoperable. Each layer is optional.
One file, multiple consumers. The same file serves a human developer, an AI agent, a schema validator, a code generator, and an import tool. Each reads what it needs.
Format independence. The SIA vocabulary is defined once and projected into each carrier format mechanically. If the vocabulary ever diverges between formats, that's a bug, not a feature.

Quick Reference

Term	Type	Applies to	Purpose
role	string	Property	Semantic role for AI composition
priority	integer 1–5	Property	Truncation priority (1 = keep first)
instruction	string	Property / Type	Natural language AI instruction
mapsTo	URI(s)	Property / Type	Cross-standard equivalence mapping

The rules

If the base format can express it natively, don't use SIA for it
Every SIA term name must be self-describing English
Every SIA-annotated schema must pass standard validation for its base format
Omit annotations rather than filling them with placeholders
The SIA vocabulary is identical across all carrier formats
Platform extensions use an open namespace — any tool can define its own

Ready to implement? Pick your format and dive into the full specification: JSON Schema + SIA or ShEx + SIA

Make schemas intelligent

What is SIA

What SIA adds

What SIA does NOT change

The SIA Vocabulary

role

priority

instruction

mapsTo

Standard role values

Priority scale

Architecture

Layer 1

Layer 2

Layer 3

Layer 4

Available Formats

JSON Schema + SIA

ShEx + SIA

More formats

How SIA maps across formats

Design Principles

Quick Reference

The rules