Open Specification · One Vocabulary, Any Format

Make schemas intelligent

SIA is a vocabulary that adds AI intelligence, cross-standard mappings, and operational metadata to your schemas — expressed in any format you already use.

One vocabulary. Multiple carrier formats. Zero reinvention.

JSON Schema + SIA
ShEx + SIA
patient.sia.json
"name": {
  "type": "string",
  "maxLength": 100,

  // SIA: AI intelligence
  "x-sia-role": "identifier",
  "x-sia-priority": 1,
  "x-sia-instruction": "Primary identifier. Always include.",

  // SIA: cross-standard mappings
  "x-maps-to": {
    "schema.org": "https://schema.org/name",
    "FHIR": "http://hl7.org/fhir/Patient.name"
  }
}
patient.sia.shex
PREFIX sia: <https://www.schematica.io/sia#>
PREFIX schema: <https://schema.org/>

:name xsd:string // xsd:maxLength 100

  # SIA: AI intelligence
  // sia:role "identifier"
  // sia:priority 1
  // sia:instruction "Primary identifier. Always include."

  # SIA: cross-standard mappings
  // sia:mapsTo schema:name
  // sia:mapsTo <http://hl7.org/fhir/Patient.name> ;
What is SIA Vocabulary Architecture Formats Principles Reference

What is SIA

Schema Intelligence Annotations (SIA) is a small, focused vocabulary that makes schemas understandable for AI agents, interoperable across standards, and portable across platforms.

SIA is not a schema language. It is a set of four annotation terms that can be expressed in any schema format. The same SIA vocabulary works in JSON Schema, ShEx, and any future format — because the annotations describe meaning, not syntax.

What SIA adds

What SIA does NOT change

Validation, structure, data types, required fields, references, inheritance, enums — all handled by the base format (JSON Schema, ShEx, or whatever you use). SIA only adds what schema languages don't have: intelligence for AI and interoperability across standards.

The SIA Vocabulary

Four terms. That's the entire specification. Each is self-describing, optional, and format-independent.

role

string · property level

Tells the AI what semantic role this property plays when composing content. Is it an identifier? A relationship? A date? A status flag?

role: "identifier" // AI knows: this names the entity

priority

integer 1–5 · property level

When context is limited, tells the AI which properties to keep (1 = always) and which to drop first (5 = expendable).

priority: 3 // AI knows: safe to drop if tight

instruction

string · property or type level

Natural language instruction for the AI. The primary anti-hallucination mechanism. Tells the AI how to use this property correctly.

instruction: "Use author name in
summaries, never the raw ID."

mapsTo

URI(s) · property or type level

Declares equivalent properties in external standards. One property can map to schema.org, Dublin Core, FHIR — as many as needed.

mapsTo: schema.org/headline
mapsTo: dc:title

Standard role values

RoleMeaningTypical properties
identifierPrimary name, ID, title, or labelname, title, slug, sku
descriptiveSummary, abstract, explanationsummary, bio, description
contentMain body, payload, primary databody, text, html
relationshipReference to another entityauthor, parent, category
classificationCategory, tag, type groupingcategory, tags, enum fields
temporalDate, time, timestamp, durationcreatedAt, publishedDate
contactEmail, phone, addressemail, phone, address
statusState, flag, lifecycle indicatorstatus, active, published
metricNumeric measurement, scoreprice, rating, count
governanceAudit, ownership, permissioncreatedBy, accessLevel
Extensible. Domain-specific roles can be added (e.g., clinical, financial, geospatial). The vocabulary is designed to grow through community contributions.

Priority scale

ValueLabelMeaning
1EssentialAlways include, even in one-line summaries
2ImportantInclude in standard context
3UsefulInclude in full display, drop under pressure
4SupplementaryInclude only with generous token budget
5BackgroundDrop first, rarely needed by AI

Architecture

A SIA-annotated schema has four layers. Only Layer 1 is required. The rest are optional and additive.

Layer 1

Base format
Structure & validation

Layer 2

SIA core
role, priority, instruction

Layer 3

Mappings
mapsTo URIs

Layer 4

Platform
Round-trip extensions

Layer 1 is whichever schema format you use — JSON Schema, ShEx, or another. It handles structure, types, validation, and all the things schema languages already do well.

Layers 2–3 are the SIA vocabulary. They add AI intelligence and cross-standard mappings using each format's native extension mechanism.

Layer 4 is for any platform to define its own namespace for round-trip metadata. A data modeling tool might use x-cm-*, a CMS might use x-sanity-*, an ETL system might use x-dbt-*. Layer 4 is deliberately open and platform-specific.

Key principle. If the base format already expresses something natively ($ref for relations, enum for taxonomies, required for mandatory fields, cardinality in ShEx), SIA never duplicates it. SIA only adds what schema languages don't have.

Available Formats

SIA is expressed in multiple carrier formats. Each uses the base format's native extension mechanism. The SIA vocabulary is identical across all formats — only the syntax differs.

How SIA maps across formats

SIA termJSON SchemaShExMechanism
rolex-sia-role// sia:roleVendor extension / RDF annotation
priorityx-sia-priority// sia:priorityVendor extension / RDF annotation
instructionx-sia-instruction// sia:instructionVendor extension / RDF annotation
mapsTox-maps-to object// sia:mapsTo URIVendor extension / RDF annotation
platform ext.x-{name}-*// {ns}:*Open namespace pattern

Design Principles

  1. Never reinvent the screw. If the base format already expresses a concept, use it natively. SIA never duplicates what JSON Schema, ShEx, or any base format already does.
  2. Self-describing names. An AI that has never seen SIA should infer the meaning from the term name alone. role and instruction are clear. Abbreviations and codes are not.
  3. Graceful ignorance. Any tool for the base format must process a SIA-annotated schema without error. Standard validators ignore extensions they don't understand.
  4. Layered enrichment. A plain schema is valid SIA. Adding role/priority/instruction makes it AI-intelligent. Adding mapsTo makes it interoperable. Each layer is optional.
  5. One file, multiple consumers. The same file serves a human developer, an AI agent, a schema validator, a code generator, and an import tool. Each reads what it needs.
  6. Format independence. The SIA vocabulary is defined once and projected into each carrier format mechanically. If the vocabulary ever diverges between formats, that's a bug, not a feature.

Quick Reference

TermTypeApplies toPurpose
rolestringPropertySemantic role for AI composition
priorityinteger 1–5PropertyTruncation priority (1 = keep first)
instructionstringProperty / TypeNatural language AI instruction
mapsToURI(s)Property / TypeCross-standard equivalence mapping

The rules

  1. If the base format can express it natively, don't use SIA for it
  2. Every SIA term name must be self-describing English
  3. Every SIA-annotated schema must pass standard validation for its base format
  4. Omit annotations rather than filling them with placeholders
  5. The SIA vocabulary is identical across all carrier formats
  6. Platform extensions use an open namespace — any tool can define its own
Ready to implement? Pick your format and dive into the full specification: JSON Schema + SIA  or  ShEx + SIA