What is SIA
Schema Intelligence Annotations (SIA) is a small, focused vocabulary that makes schemas understandable for AI agents, interoperable across standards, and portable across platforms.
SIA is not a schema language. It is a set of four annotation terms that can be expressed in any schema format. The same SIA vocabulary works in JSON Schema, ShEx, and any future format — because the annotations describe meaning, not syntax.
What SIA adds
- AI operational metadata — tells AI agents what each property means, how important it is, and how to use it. Prevents hallucination. Enables intelligent truncation under token pressure.
- Cross-standard mappings — declares how each property maps to schema.org, Dublin Core, FHIR, or any external vocabulary. One schema, multiple standards, no separate mapping files.
What SIA does NOT change
Validation, structure, data types, required fields, references, inheritance, enums — all handled by the base format (JSON Schema, ShEx, or whatever you use). SIA only adds what schema languages don't have: intelligence for AI and interoperability across standards.
The SIA Vocabulary
Four terms. That's the entire specification. Each is self-describing, optional, and format-independent.
role
Tells the AI what semantic role this property plays when composing content. Is it an identifier? A relationship? A date? A status flag?
priority
When context is limited, tells the AI which properties to keep (1 = always) and which to drop first (5 = expendable).
instruction
Natural language instruction for the AI. The primary anti-hallucination mechanism. Tells the AI how to use this property correctly.
summaries, never the raw ID."
mapsTo
Declares equivalent properties in external standards. One property can map to schema.org, Dublin Core, FHIR — as many as needed.
mapsTo: dc:title
Standard role values
| Role | Meaning | Typical properties |
|---|---|---|
identifier | Primary name, ID, title, or label | name, title, slug, sku |
descriptive | Summary, abstract, explanation | summary, bio, description |
content | Main body, payload, primary data | body, text, html |
relationship | Reference to another entity | author, parent, category |
classification | Category, tag, type grouping | category, tags, enum fields |
temporal | Date, time, timestamp, duration | createdAt, publishedDate |
contact | Email, phone, address | email, phone, address |
status | State, flag, lifecycle indicator | status, active, published |
metric | Numeric measurement, score | price, rating, count |
governance | Audit, ownership, permission | createdBy, accessLevel |
clinical, financial, geospatial). The vocabulary is designed to grow through community contributions.Priority scale
| Value | Label | Meaning |
|---|---|---|
| 1 | Essential | Always include, even in one-line summaries |
| 2 | Important | Include in standard context |
| 3 | Useful | Include in full display, drop under pressure |
| 4 | Supplementary | Include only with generous token budget |
| 5 | Background | Drop first, rarely needed by AI |
Architecture
A SIA-annotated schema has four layers. Only Layer 1 is required. The rest are optional and additive.
Layer 1
Base format
Structure & validation
Layer 2
SIA core
role, priority, instruction
Layer 3
Mappings
mapsTo URIs
Layer 4
Platform
Round-trip extensions
Layer 1 is whichever schema format you use — JSON Schema, ShEx, or another. It handles structure, types, validation, and all the things schema languages already do well.
Layers 2–3 are the SIA vocabulary. They add AI intelligence and cross-standard mappings using each format's native extension mechanism.
Layer 4 is for any platform to define its own namespace for round-trip metadata. A data modeling tool might use x-cm-*, a CMS might use x-sanity-*, an ETL system might use x-dbt-*. Layer 4 is deliberately open and platform-specific.
$ref for relations, enum for taxonomies, required for mandatory fields, cardinality in ShEx), SIA never duplicates it. SIA only adds what schema languages don't have.Available Formats
SIA is expressed in multiple carrier formats. Each uses the base format's native extension mechanism. The SIA vocabulary is identical across all formats — only the syntax differs.
JSON Schema + SIA
Standard JSON Schema with x-sia-* vendor extensions. Best LLM comprehension — the format AI agents are fine-tuned on for function calling. Universal tooling support.
ShEx + SIA
Standard ShExC with // sia:* annotations. 35% more compact. Native semantic web interoperability. Ideal for token-constrained AI operations and linked data workflows.
More formats
SIA is designed to be format-independent. Future expressions may include YAML-native, Protobuf, or GraphQL SDL — wherever schemas live, SIA can follow.
How SIA maps across formats
| SIA term | JSON Schema | ShEx | Mechanism |
|---|---|---|---|
| role | x-sia-role | // sia:role | Vendor extension / RDF annotation |
| priority | x-sia-priority | // sia:priority | Vendor extension / RDF annotation |
| instruction | x-sia-instruction | // sia:instruction | Vendor extension / RDF annotation |
| mapsTo | x-maps-to object | // sia:mapsTo URI | Vendor extension / RDF annotation |
| platform ext. | x-{name}-* | // {ns}:* | Open namespace pattern |
Design Principles
- Never reinvent the screw. If the base format already expresses a concept, use it natively. SIA never duplicates what JSON Schema, ShEx, or any base format already does.
- Self-describing names. An AI that has never seen SIA should infer the meaning from the term name alone.
roleandinstructionare clear. Abbreviations and codes are not. - Graceful ignorance. Any tool for the base format must process a SIA-annotated schema without error. Standard validators ignore extensions they don't understand.
- Layered enrichment. A plain schema is valid SIA. Adding role/priority/instruction makes it AI-intelligent. Adding mapsTo makes it interoperable. Each layer is optional.
- One file, multiple consumers. The same file serves a human developer, an AI agent, a schema validator, a code generator, and an import tool. Each reads what it needs.
- Format independence. The SIA vocabulary is defined once and projected into each carrier format mechanically. If the vocabulary ever diverges between formats, that's a bug, not a feature.
Quick Reference
| Term | Type | Applies to | Purpose |
|---|---|---|---|
| role | string | Property | Semantic role for AI composition |
| priority | integer 1–5 | Property | Truncation priority (1 = keep first) |
| instruction | string | Property / Type | Natural language AI instruction |
| mapsTo | URI(s) | Property / Type | Cross-standard equivalence mapping |
The rules
- If the base format can express it natively, don't use SIA for it
- Every SIA term name must be self-describing English
- Every SIA-annotated schema must pass standard validation for its base format
- Omit annotations rather than filling them with placeholders
- The SIA vocabulary is identical across all carrier formats
- Platform extensions use an open namespace — any tool can define its own