How SIA Works in ShEx
ShEx has a first-class annotation mechanism using the // syntax. Unlike JSON Schema's x- convention, ShEx annotations are typed RDF statements — they are part of the specification, preserved by every compliant parser, and can be queried with standard RDF tools.
SIA uses this native mechanism. Every SIA annotation is a // sia:* statement attached to a triple constraint (property) or shape declaration (type).
| SIA term | ShEx annotation | Value type | Applies to |
|---|---|---|---|
| role | // sia:role | string literal | Triple constraint |
| priority | // sia:priority | integer literal | Triple constraint |
| instruction | // sia:instruction | string literal | Triple constraint / shape |
| mapsTo | // sia:mapsTo | IRI | Triple constraint / shape |
// sia:role "identifier", this is an RDF statement: the subject is the triple constraint, the predicate is sia:role, and the object is "identifier". This means SIA metadata is queryable, indexable, and interoperable with any RDF toolchain.PREFIX Declarations
Every ShEx + SIA schema begins with PREFIX declarations that define the namespaces used in the schema.
# Required: SIA vocabulary PREFIX sia: <https://www.schematica.io/sia#> # Required: XML Schema data types PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> # Your schema namespace PREFIX : <https://example.org/schema/> # Standards you map to (as needed) PREFIX schema: <https://schema.org/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX fhir: <http://hl7.org/fhir/>
sia: for the SIA vocabulary, : (default prefix) for your own schema properties, and named prefixes for any standards you map to. This keeps the schema compact and readable.// sia:role
Value: string literal • Applies to: triple constraint (property)
Tells the AI what semantic role this property plays. See SIA vocabulary for the complete list of standard role values.
:title xsd:string // sia:role "identifier" ; # names the entity :publishedDate xsd:date // sia:role "temporal" ; # a date/time field :author @:Author // sia:role "relationship" ; # links to another entity :category ["tech" "science"] // sia:role "classification" ; # taxonomy / enum
// sia:priority
Value: integer 1–5 • Applies to: triple constraint (property)
Truncation priority. 1 = always keep, 5 = drop first. See SIA vocabulary for the full priority scale.
:title xsd:string // sia:priority 1 ; # essential: always include :body xsd:string // sia:priority 2 ; # important: standard context :summary xsd:string ? // sia:priority 3 ; # useful: drop under pressure :email xsd:string ? // sia:priority 4 ; # supplementary :internalId xsd:string ? // sia:priority 5 ; # background: drop first
// sia:instruction
Value: string literal • Applies to: triple constraint or shape declaration
Natural language instruction for the AI. Anti-hallucination mechanism.
On a property
:author @:Author // sia:role "relationship" // sia:instruction "Link to Author type. In summaries, use the author's display name, never the raw ID." ;
On a shape (type-level)
# Type-level instruction as a comment-annotation pattern :BlogPost { # A blog post. Title and author are always needed. # Summary can be dropped in constrained contexts. :title xsd:string // sia:role "identifier" ; ... }
sia:shapeInstruction annotation on the first triple constraint. The SIA vocabulary handles both patterns.// sia:mapsTo
Value: IRI • Applies to: triple constraint or shape
Declares equivalent properties in external standards. Each mapping is a separate // sia:mapsTo annotation with a URI value.
:title xsd:string // sia:mapsTo schema:headline # schema.org // sia:mapsTo dc:title # Dublin Core // sia:mapsTo <http://ogp.me/ns#title> # Open Graph ;
How this differs from JSON Schema
In JSON Schema, x-maps-to is a single object with human-readable keys:
// JSON Schema: one object, named keys "x-maps-to": { "schema.org": "https://schema.org/headline", "Dublin Core": "http://purl.org/dc/elements/1.1/title" }
In ShEx, each mapping is a separate RDF annotation with a URI:
# ShEx: repeated annotations, URI values // sia:mapsTo schema:headline // sia:mapsTo dc:title
Platform Extensions
Any platform can define its own annotation namespace. In ShEx, this uses the same // mechanism with a platform-specific prefix.
PREFIX cm: <https://coremodels.io/ns#> :author @:Author // sia:role "relationship" // sia:priority 2 // cm:relation "domainIncludes" # platform-specific // cm:order 4 # platform-specific ;
Structural Mapping: ShEx vs JSON Schema
Most schema concepts map directly between formats. Here's how the base layer (Layer 1) translates.
| Concept | JSON Schema | ShEx |
|---|---|---|
| String | "type": "string" | xsd:string |
| Integer | "type": "integer" | xsd:integer |
| Boolean | "type": "boolean" | xsd:boolean |
| Date | "format": "date" | xsd:date |
| Date-time | "format": "date-time" | xsd:dateTime |
| Required | In required array | Default (no ?) |
| Optional | Not in required | Add ? after type |
| Relation | "$ref": "#/$defs/X" | @:X |
| Array | "type": "array" | xsd:string * or @:X * |
| Enum | "enum": ["a","b"] | ["a" "b"] |
| Inheritance | "allOf": [{"$ref":"..."}] | EXTENDS @:Parent |
| Closed | "additionalProperties": false | CLOSED |
| Pattern | "pattern": "^[A-Z]+" | xsd:string ~ /^[A-Z]+/ |
| Max length | "maxLength": 200 | // xsd:maxLength 200 |
| Description | "description": "..." | # comment or // rdfs:comment "..." |
? for optional (0 or 1), * for array (0 or more), + for non-empty array (1 or more). This is the opposite of JSON Schema where properties are optional by default.Token Efficiency
ShEx + SIA achieves ~35% fewer tokens than JSON Schema + SIA for the same schema content. For bulk operations with many types, this compression is significant.
| Format | Tokens (BlogPost + Author) | Savings |
|---|---|---|
| JSON Schema + SIA (JSON) | ~170 | baseline |
| JSON Schema + SIA (YAML) | ~140 | -18% |
| ShEx + SIA | ~110 | -35% |
| ShEx + SIA (minified) | ~90 | -47% |
Where the savings come from:
- No braces, brackets, or commas — ShEx uses whitespace and semicolons
- No quoted property names — properties are bare prefixed URIs
- No
"type": "object"wrapper — shapes are implicit objects - No
"properties": { ... }nesting — constraints are flat - Cardinality is a single character (
?,*,+) vs verbose keywords
SIA's progressive truncation via sia:priority works identically in ShEx. Strip lower-priority annotations to fit any token budget.
Complete Example
Blog content model: two shapes, a value set, and schema.org + Dublin Core mappings. More examples on GitHub ›
PREFIX : <https://example.org/schema/blog/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX sia: <https://www.schematica.io/sia#> PREFIX schema: <https://schema.org/> PREFIX dc: <http://purl.org/dc/elements/1.1/> # BlogPost — A blog post. Title and author always needed. :BlogPost { :title xsd:string // xsd:maxLength 200 // sia:role "identifier" // sia:priority 1 // sia:mapsTo schema:headline // sia:mapsTo dc:title ; :body xsd:string // sia:role "content" // sia:priority 2 // sia:mapsTo schema:articleBody ; :summary xsd:string ? // sia:role "descriptive" // sia:priority 3 // sia:instruction "Brief abstract. 1-2 sentences. Safe to drop in constrained contexts." // sia:mapsTo schema:abstract ; :author @:Author // sia:role "relationship" // sia:priority 2 // sia:instruction "Use author display name in summaries, never the raw ID or shape reference." // sia:mapsTo schema:author // sia:mapsTo dc:creator ; :publishedDate xsd:date ? // sia:role "temporal" // sia:priority 3 // sia:mapsTo schema:datePublished // sia:mapsTo dc:date ; :category ["tech" "science" "culture"] ? // sia:role "classification" // sia:priority 3 // sia:mapsTo schema:articleSection ; :tags xsd:string * // sia:role "classification" // sia:priority 4 // sia:mapsTo schema:keywords } # Author :Author { :name xsd:string // sia:role "identifier" // sia:priority 1 // sia:mapsTo schema:name ; :email xsd:string ? // sia:role "contact" // sia:priority 4 // sia:mapsTo schema:email ; :bio xsd:string ? // sia:role "descriptive" // sia:priority 3 }
Differences from JSON Schema Expression
The SIA vocabulary translates losslessly between formats. But the base format layer (Layer 1) has some differences. These are documented honestly, not hidden.
| Feature | JSON Schema | ShEx | Impact |
|---|---|---|---|
| Conditional validation | if/then/else native | No equivalent | Document the rule in sia:instruction |
| Pattern properties | patternProperties | No equivalent | Document in sia:instruction |
| String format | "format": "email" | Regex pattern or NodeKind | Partial — use ~ /regex/ |
| Property names | Local strings | URI-based | ShEx names ARE the ontology |
| Mapping syntax | Single x-maps-to object | Repeated // sia:mapsTo | Same semantics, different syntax |
| Annotations | x- convention | // first-class | ShEx annotations are typed RDF |
| Default cardinality | Optional | Required | Remember to add ? for optional |
sia:instruction documentation rather than executable constraints.What ShEx adds that JSON Schema cannot express
- URI-based property names — properties ARE the ontology, not local strings that need separate mapping
- First-class typed annotations — SIA metadata is queryable RDF, not opaque key-value pairs
- IMPORTS — pull in external shape definitions with
IMPORT <url> - Inverse constraints — match incoming edges, not just outgoing properties
- Semantic actions — hooks for transformation logic during validation