The SEMIC Style Guide: A Framework for Creating Clear and Interoperable Semantic Data Specifications

Introduction to the SEMIC Style Guide

The European Commission developed the SEMIC Style Guide as a practical framework to promote semantic interoperability across EU Member States. It provides clear rules and guidelines for naming conventions, syntax, and artefact management, helping organisations build and maintain Core Vocabulary and Application Profiles efficiently.

The guide aims to support public administrations, semantic engineers, and data architects in achieving harmonised data communication and reuse across Europe.

The Target Audience

The SEMIC Style Guide primarily targets:

➡️Semantic Engineers who design and manage data models.

➡️ Data Architects tasked with ensuring consistency across systems.

➡️ Knowledge Modeling Specialists acting as editors or reusers of Core Vocabularies.

It also serves as a valuable reference for:

➡️ European Commission officers and collaborating consultants.

➡️ Stakeholders involved in inter-institutional standardization efforts.

This guide will equip you with the tools and principles needed to ensure clarity and alignment when creating, maintaining, or reusing semantic data specifications.

The Scope

The SEMIC Style Guide offers guidance on:

➡️ Terminology Clarifications – Definitions of key terms such as “reuse,” “ontology,” and “Core Vocabulary.”

➡️Architecture Overview – How data specification types interconnect and derive from a unified conceptual model.

➡️ “Reuse” Principles – Clear rules on what can and cannot be reused within SEMIC.

➡️ Organizing Models and Artefacts – Practical recommendations for structuring conceptual and semantic artefacts.

➡️ Specification Development Methodology – High-level advice on development practices and conventions.

The SEMIC Style Guide does not provide:

• Detailed modelling methodologies or maintenance processes.

• Governance roles, release management, or lifecycle instructions.

• Specific implementation tools or syntax bindings (e.g., JSON-LD or XML).

• End-user instructions or usage guidelines.

Instead, it works hand-in-hand with complementary documents, such as:

• Transformation tool manuals for converting UML into other semantic formats.

• Documentation for reused specifications and URI policy guidelines.

Basic Terminology

Semantic data specifications are composed of different components or representations, known as “artefacts,” that serve specific purposes.

Think of it like a toolbox. The semantic data specifications toolbox provides guidelines, rules, and a structure for defining and reusing data and concepts. The artefacts are the tools inside the box, each designed to address a specific need or a clear concern.

In the SEMIC context, a semantic data specification can be:

➡️A Core Vocabulary that provides simplified, reusable models for basic concepts, like “Person” or “Location.”

➡️ An Application Profile that builds on Core Vocabulary by adding context-specific rules or constraints to meet more specialized needs.

In the SEMIC context, we include, but are not limited to, the following list of artefacts:

Persistent URIs – Reliable identifiers that ensure terms remain findable and usable over time.

OWL 2 Representations – Formal definitions that allow machines to understand and process relationships between terms.

SHACL Representations – Rules and constraints that validate data structures for accuracy and consistency.

HTML Representations – Human-readable documentation to make specifications accessible to users.

Pictures/Diagrams – Visual aids that simplify complex structures for clarity.

UML Representations – Conceptual models that represent terms and relationships visually.

JSON-LD and JSON Schema – Formats tailored for lightweight, machine-readable data.

XML and XSD Schema – Structured, schema-driven formats for data exchange.

While the world of semantic data contains a long list of artefacts, the SEMIC Style Guide focuses on those directly relevant to the Semantic Interoperability Layer of the European Interoperability Framework (EIF), that enable interoperability across systems and organizations.

Core Concepts Clarified

The terminology within the SEMIC Style Guide can seem overwhelming at first. To make it more approachable for semantic engineers, data architects, and knowledge modeling specialists, let’s break it down step by step, ensuring each concept is clear and actionable.

This section introduces eight foundational concepts you need to know:

1. Conceptual Model

2. Ontology

3. Data Shape Specification

4. Data Specification Document

5. Data Specification Artefact

6. Semantic Data Specification

7. Core Vocabulary (CV) Specification

8. Application Profile (AP) Specification

Each term is explained with clear definitions, descriptions, and examples that show how they combine to form robust semantic data specifications.

These concepts are the building blocks for achieving interoperability across systems and domains. They explain how data models are created, reused, and adapted to meet specific needs while maintaining clarity and consistency.

1. Conceptual Model

A conceptual model or conceptual model specification is a simplified, abstract representation of a system. It lays out the key concepts, their characteristics, and how they relate to one another. A system is a group of interconnected parts that work together under a common set of rules to form a cohesive whole.

This model ensures that everyone, from technical teams to domain experts, understands the system in a common way.

The subset of the UML language covered in this style guide includes, but isn’t limited to, the following elements:

➡ Class

➡ Class Attribute

➡ Connector –

• Association

• Dependency

• Generalisation

➡ Enumeration

In SEMIC, we use UML Class Diagrams for conceptual clarity and visual representation. Although UML is not perfect, it’s practical, widely used, and effective.

🟢 Example:

If you’re modeling “Person,” a conceptual model defines what makes up a person, like name, age, and relationships, without getting into technical details like formats or systems.

2. Ontology

An ontology transforms the conceptual model into a machine-readable specification. It’s the formal structure that defines concepts, relationships, and rules so that systems can understand and process them automatically.

The ontology comprises the formal (machine-readable) definition of concepts.

An ontology is a formal specification that defines concepts, their relationships, and their properties. It is expressed using the OWL 2 language.

Ontologies provide shared, unambiguous definitions for humans and machines, ensuring systems “speak the same language.”

SEMIC uses simple, practical, and easy-to-reuse ontologies.

🟢 Example:

In a “Public Sector” domain, an ontology might formally define concepts like “Organisation” or “Location” and their relationships in a machine-readable format.

3. Data Shape Specification

If an ontology sets the rules, a data shape specification tightens those rules. It defines specific constraints on how the ontology can be used, ensuring data stays valid, consistent, and reliable.

A data shape specification is a set of constraints applied to an ontology expressed in SHACL. It validates data structures and guarantees that they meet defined requirements. Data shapes ensure systems use ontologies correctly, flagging errors or inconsistencies early.

🟢 Example:

If “Person” has an age attribute, a data shape can require that the age be a positive number.

4. Data Specification Document

A data specification document is a human-readable guide that connects everything. It clearly explains how to use an ontology or a data shape.

A data specification is a technical document describing an ontology, data shape, or both. It ensures the connection between technical specifications and real-world implementation.

In the SEMIC context, these documents are typically published in HTML format for easy access.

🟢 Example:

A specification document with complete examples and rules explains how to represent “Person” in systems.

5. Data Specification Artefact

Artefacts are the practical outputs of a data specification, such as diagrams, ontologies, or data shapes. Each serves a specific purpose in the modeling process.

An artefact is a materialized representation of a semantic data specification addressing specific needs. Artefacts allow models to be implemented, visualized, and shared across systems.

In the SEMIC context, we use the following types of artefacts:

➡ OWL ontologies

➡ SHACL constraints

➡ UML diagrams

➡ HTML documentation

➡ JSON-LD

➡ XML formats.

🟢 Example:

A UML diagram for “Organisation” helps domain experts understand relationships, while a JSON-LD version makes it machine-readable.

6. Semantic Data Specification

A semantic data specification is a structured set of artefacts, both human-readable and machine-readable, that address specific concerns, interoperability scopes, and use cases. At its core, it includes:

An ontology – Formal definitions of concepts and relationships.

A data shape – Constraints that define how those concepts can be applied.

Together, these components ensure consistent, reusable, and interoperable data models.

Semantic data specifications can be grouped along the “reuse axis”, from broad, context-free definitions to highly specific implementations:

Vocabularies

• Designed for maximum reuse across systems.

• Broad, abstract definitions that act as building blocks (e.g., “Person” or “Location”).

Implementation Models

• Highly specific semantic models tailored for single-use cases, like APIs or software systems.

• Include strict constraints, technical datatypes, and implementation rules.

Application Profiles (APs)

• A middle ground that refines vocabularies for specific use cases while maintaining interoperability.

• Adds constraints to existing concepts for context-specific applications.

SEMIC focuses on Core Vocabulary (broad reuse) and Application Profiles (context-specific refinements). While Implementation Models are acknowledged, they are outside SEMIC’s primary scope.

A semantic data specification connects abstract concepts with practical implementation. Whether broad, like vocabularies, tailored, like Application Profiles, or precise, like Implementation Models, these specifications ensure systems can seamlessly understand, exchange, and reuse data.

🟢 Example:

A “Public Organisation” data specification defines the information needed to describe a public entity (e.g., name, jurisdiction, and type) and sets rules to ensure the data is consistent, accurate, and interoperable.

7. Core Vocabulary

A Core Vocabulary (CV) offers a set of basic, reusable terms that are designed to be simple yet flexible, allowing them to be extended or tailored to meet specific needs across different contexts.

A CV is a basic, context-neutral specification that captures an entity’s essential characteristics. It is designed for maximum reuse, making it the foundation for broader applications.

The SEMIC Formula:

Core Vocabulary = Lightweight Ontology + Optional Permissive Data Shape.

🟢 Example:

A Core Vocabulary for “Location” defines terms like “Country,” “City,” or “Address” in their simplest, reusable form.

8. Application Profile

An Application Profile (AP) is a targeted data specification designed for a specific application context. It builds on existing semantic data specifications, reusing their concepts while adding precision by defining mandatory, recommended, and optional elements. It also suggests controlled vocabularies to address particular needs and ensure seamless data exchange.

APs enable semantic interoperability in specific, real-world implementations.

The SEMIC Formula:

Aplication Profile = Reused Core Vocabulary + Own Data Shape + Optional Ontology.

🟢 Example:

The Core Person Vocabulary can be reused in an AP for “Public Sector Citizen,” which adds rules like “Citizens must have a unique ID.”

Conclusion

The SEMIC Style Guide, developed by the European Commission, is a vital framework for achieving semantic interoperability across EU Member States. This framework aims to create data consistent, reusable, and clear enough for humans to understand but structured enough for machines to process without a hitch.

This Style Guide reflects a shared commitment to openness and collaboration. It ensures that all stakeholders can align on common principles for semantic clarity.

At Meaningfy, we were honored to contribute to this project by developing the Style Guide. The guide supports semantic engineers, data architects, and knowledge modelers in creating robust and interoperable solutions.

Stay tuned for the second part of this series, where we’ll explore the Architectural Clarifications of the SEMIC Style Guide, the nuanced approach to reuse, and the detailed guidelines and conventions that make SEMIC a powerful tool for data standardization.

We invite you to explore this article for additional insights about semantic interoperability solutions.

Meaningfy continues to support the European Commission’s initiatives, leading the charge toward a transparent, efficient, and interconnected European public sector. If you represent a European Institution or a public company that needs to implement an interoperability solution, contact us for tailored support and effective implementation.

You may also like

Get in touch to learn more about our semantic interoperability solutions.