SIA for ShEx — Schema Intelligence Annotations

How SIA Works in ShEx

ShEx has a first-class annotation mechanism using the // syntax. Unlike JSON Schema's x- convention, ShEx annotations are typed RDF statements — they are part of the specification, preserved by every compliant parser, and can be queried with standard RDF tools.

SIA uses this native mechanism. Every SIA annotation is a // sia:* statement attached to a triple constraint (property) or shape declaration (type).

SIA term	ShEx annotation	Value type	Applies to
role	`// sia:role`	string literal	Triple constraint
priority	`// sia:priority`	integer literal	Triple constraint
instruction	`// sia:instruction`	string literal	Triple constraint / shape
mapsTo	`// sia:mapsTo`	IRI	Triple constraint / shape

Annotations are formal RDF. When you write // sia:role "identifier", this is an RDF statement: the subject is the triple constraint, the predicate is sia:role, and the object is "identifier". This means SIA metadata is queryable, indexable, and interoperable with any RDF toolchain.

PREFIX Declarations

Every ShEx + SIA schema begins with PREFIX declarations that define the namespaces used in the schema.

# Required: SIA vocabulary
PREFIX sia: <https://www.schematica.io/sia#>

# Required: XML Schema data types
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Your schema namespace
PREFIX : <https://example.org/schema/>

# Standards you map to (as needed)
PREFIX schema: <https://schema.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX fhir: <http://hl7.org/fhir/>

Convention. Use sia: for the SIA vocabulary, : (default prefix) for your own schema properties, and named prefixes for any standards you map to. This keeps the schema compact and readable.

// sia:role

Value: string literal • Applies to: triple constraint (property)

Tells the AI what semantic role this property plays. See SIA vocabulary for the complete list of standard role values.

:title xsd:string
  // sia:role "identifier" ;     # names the entity

:publishedDate xsd:date
  // sia:role "temporal" ;       # a date/time field

:author @:Author
  // sia:role "relationship" ;   # links to another entity

:category ["tech" "science"]
  // sia:role "classification" ; # taxonomy / enum

// sia:priority

Value: integer 1–5 • Applies to: triple constraint (property)

Truncation priority. 1 = always keep, 5 = drop first. See SIA vocabulary for the full priority scale.

:title xsd:string
  // sia:priority 1 ;             # essential: always include

:body xsd:string
  // sia:priority 2 ;             # important: standard context

:summary xsd:string ?
  // sia:priority 3 ;             # useful: drop under pressure

:email xsd:string ?
  // sia:priority 4 ;             # supplementary

:internalId xsd:string ?
  // sia:priority 5 ;             # background: drop first

// sia:instruction

Value: string literal • Applies to: triple constraint or shape declaration

Natural language instruction for the AI. Anti-hallucination mechanism.

On a property

:author @:Author
  // sia:role "relationship"
  // sia:instruction "Link to Author type. In summaries,
     use the author's display name, never the raw ID." ;

On a shape (type-level)

# Type-level instruction as a comment-annotation pattern
:BlogPost {
  # A blog post. Title and author are always needed.
  # Summary can be dropped in constrained contexts.

  :title xsd:string
    // sia:role "identifier" ;
  ...
}

Note. ShEx annotations attach to triple constraints, not shapes directly. For type-level instructions, use a structured comment above the shape declaration, or define a dedicated sia:shapeInstruction annotation on the first triple constraint. The SIA vocabulary handles both patterns.

// sia:mapsTo

Value: IRI • Applies to: triple constraint or shape

Declares equivalent properties in external standards. Each mapping is a separate // sia:mapsTo annotation with a URI value.

:title xsd:string
  // sia:mapsTo schema:headline           # schema.org
  // sia:mapsTo dc:title                   # Dublin Core
  // sia:mapsTo <http://ogp.me/ns#title>   # Open Graph
  ;

How this differs from JSON Schema

In JSON Schema, x-maps-to is a single object with human-readable keys:

// JSON Schema: one object, named keys
"x-maps-to": {
  "schema.org": "https://schema.org/headline",
  "Dublin Core": "http://purl.org/dc/elements/1.1/title"
}

In ShEx, each mapping is a separate RDF annotation with a URI:

# ShEx: repeated annotations, URI values
// sia:mapsTo schema:headline
// sia:mapsTo dc:title

Both carry the same information. The JSON Schema form uses human-readable labels as keys. The ShEx form uses URIs directly — more machine-precise, and the standard name is inferred from the URI namespace. Use comments for readability when needed.

Platform Extensions

Any platform can define its own annotation namespace. In ShEx, this uses the same // mechanism with a platform-specific prefix.

PREFIX cm: <https://coremodels.io/ns#>

:author @:Author
  // sia:role "relationship"
  // sia:priority 2
  // cm:relation "domainIncludes"    # platform-specific
  // cm:order 4                     # platform-specific
  ;

When to include. Export for AI / external sharing → omit platform annotations. Export for round-trip import → include them.

Structural Mapping: ShEx vs JSON Schema

Most schema concepts map directly between formats. Here's how the base layer (Layer 1) translates.

Concept	JSON Schema	ShEx
String	`"type": "string"`	`xsd:string`
Integer	`"type": "integer"`	`xsd:integer`
Boolean	`"type": "boolean"`	`xsd:boolean`
Date	`"format": "date"`	`xsd:date`
Date-time	`"format": "date-time"`	`xsd:dateTime`
Required	In `required` array	Default (no `?`)
Optional	Not in `required`	Add `?` after type
Relation	`"$ref": "#/$defs/X"`	`@:X`
Array	`"type": "array"`	`xsd:string ` or `@:X `
Enum	`"enum": ["a","b"]`	`["a" "b"]`
Inheritance	`"allOf": [{"$ref":"..."}]`	`EXTENDS @:Parent`
Closed	`"additionalProperties": false`	`CLOSED`
Pattern	`"pattern": "^[A-Z]+"`	`xsd:string ~ /^[A-Z]+/`
Max length	`"maxLength": 200`	`// xsd:maxLength 200`
Description	`"description": "..."`	`# comment` or `// rdfs:comment "..."`

Cardinality note. ShEx defaults to required (exactly 1). Use ? for optional (0 or 1), * for array (0 or more), + for non-empty array (1 or more). This is the opposite of JSON Schema where properties are optional by default.

Token Efficiency

ShEx + SIA achieves ~35% fewer tokens than JSON Schema + SIA for the same schema content. For bulk operations with many types, this compression is significant.

Format	Tokens (BlogPost + Author)	Savings
JSON Schema + SIA (JSON)	~170	baseline
JSON Schema + SIA (YAML)	~140	-18%
ShEx + SIA	~110	-35%
ShEx + SIA (minified)	~90	-47%

Where the savings come from:

No braces, brackets, or commas — ShEx uses whitespace and semicolons
No quoted property names — properties are bare prefixed URIs
No "type": "object" wrapper — shapes are implicit objects
No "properties": { ... } nesting — constraints are flat
Cardinality is a single character (?, *, +) vs verbose keywords

SIA's progressive truncation via sia:priority works identically in ShEx. Strip lower-priority annotations to fit any token budget.

Complete Example

Blog content model: two shapes, a value set, and schema.org + Dublin Core mappings. More examples on GitHub ›

PREFIX :       <https://example.org/schema/blog/>
PREFIX xsd:    <http://www.w3.org/2001/XMLSchema#>
PREFIX sia:    <https://www.schematica.io/sia#>
PREFIX schema: <https://schema.org/>
PREFIX dc:     <http://purl.org/dc/elements/1.1/>

# BlogPost — A blog post. Title and author always needed.
:BlogPost {

  :title xsd:string // xsd:maxLength 200
    // sia:role        "identifier"
    // sia:priority    1
    // sia:mapsTo      schema:headline
    // sia:mapsTo      dc:title
  ;

  :body xsd:string
    // sia:role        "content"
    // sia:priority    2
    // sia:mapsTo      schema:articleBody
  ;

  :summary xsd:string ?
    // sia:role        "descriptive"
    // sia:priority    3
    // sia:instruction "Brief abstract. 1-2 sentences.
       Safe to drop in constrained contexts."
    // sia:mapsTo      schema:abstract
  ;

  :author @:Author
    // sia:role        "relationship"
    // sia:priority    2
    // sia:instruction "Use author display name in summaries,
       never the raw ID or shape reference."
    // sia:mapsTo      schema:author
    // sia:mapsTo      dc:creator
  ;

  :publishedDate xsd:date ?
    // sia:role        "temporal"
    // sia:priority    3
    // sia:mapsTo      schema:datePublished
    // sia:mapsTo      dc:date
  ;

  :category ["tech" "science" "culture"] ?
    // sia:role        "classification"
    // sia:priority    3
    // sia:mapsTo      schema:articleSection
  ;

  :tags xsd:string *
    // sia:role        "classification"
    // sia:priority    4
    // sia:mapsTo      schema:keywords

}

# Author
:Author {

  :name xsd:string
    // sia:role     "identifier"
    // sia:priority 1
    // sia:mapsTo   schema:name
  ;

  :email xsd:string ?
    // sia:role     "contact"
    // sia:priority 4
    // sia:mapsTo   schema:email
  ;

  :bio xsd:string ?
    // sia:role     "descriptive"
    // sia:priority 3
}

Differences from JSON Schema Expression

The SIA vocabulary translates losslessly between formats. But the base format layer (Layer 1) has some differences. These are documented honestly, not hidden.

Feature	JSON Schema	ShEx	Impact
Conditional validation	`if/then/else` native	No equivalent	Document the rule in `sia:instruction`
Pattern properties	`patternProperties`	No equivalent	Document in `sia:instruction`
String format	`"format": "email"`	Regex pattern or NodeKind	Partial — use `~ /regex/`
Property names	Local strings	URI-based	ShEx names ARE the ontology
Mapping syntax	Single `x-maps-to` object	Repeated `// sia:mapsTo`	Same semantics, different syntax
Annotations	`x-` convention	`//` first-class	ShEx annotations are typed RDF
Default cardinality	Optional	Required	Remember to add `?` for optional

Structural schema and SIA annotations convert losslessly between formats. Only certain base-format validation features (if/then/else, patternProperties) don't have ShEx equivalents. When converting from JSON Schema, these are preserved as sia:instruction documentation rather than executable constraints.

What ShEx adds that JSON Schema cannot express

URI-based property names — properties ARE the ontology, not local strings that need separate mapping
First-class typed annotations — SIA metadata is queryable RDF, not opaque key-value pairs
IMPORTS — pull in external shape definitions with IMPORT <url>
Inverse constraints — match incoming edges, not just outgoing properties
Semantic actions — hooks for transformation logic during validation