How to Convert XML Data to RDF Graphs: A Step-by-Step Guide

by | Jun 13, 2025 | Data Interoperability, Semantic Technology

Improving Complex Data Modelling by Going from XML to RDF

Many may wonder, why would anyone want to convert XML data to RDF graphs?

The short answer is semantics.

Improving Complex Data Modelling by Going from XML to RDF

Yes, XML provides structure for the data. However, this is a tree structure, with the information organized in XML elements and sub-elements, similar to the folders and subfolders in a file system.

In RDF, we are dealing with a (directed) graph, which consists of nodes that are connected with edges pointing in a given direction. More importantly, each node can have a type, and that type is defined in an ontology.

Therefore, in RDF, we are not just saying, “This thing exists.” We also say, “This thing has a specific type; it is related to something else, and these relationships have defined meaning.”

That is the key difference between XML and RDF. Our goal is to represent the data and give it meaning. We want to ensure the data is understandable across different systems, over time, and at scale.

RDF gives our data both structure and meaning, which is essential when dealing with complex, changing, or highly interconnected information.

From XML to RDF: A Practical Approach for Complex Structures

At Meaningfy, we are working with a client who needs to convert large volumes of structured XML data into RDF. The client’s first challenge is the volume, and the second is the complexity.

The source data comes in many different types, each with its own structure, and the schema behind it changes over time. To support this, we have designed a flexible RDF mapping approach that handles hundreds of fields, dozens of document types, and multiple schema versions, while ensuring that the output stays valid and semantically consistent (according to a specific version of an ontology).

Understanding the Scale: Forms, Fields, Versions

In this project, we have encountered two kinds of XML forms. One kind is based on the so called “standard forms”, whose structure is defined by XSD files. The other kind is based on “eForms”, a more recent development. eForms come with an SDK (software development kit), which provides a lot more structure, but also more complexity.

Right now, we are working with:

  • Over 700 individual data fields
  • 51 different document types
  • 10+ eForms SDK versions

Not every field may appear in every document. Some fields are required, some are optional, and their definition can change from one SDK version to another: fields may be added, removed, renamed or redefined.

Therefore, we must track what fields exist, when (in what XSD/SDK versions), and where (i.e., in what forms) they apply.

In one of our implementation phases, we had to create the mappings of 49 out of the 51 document types.

In order to handle this complexity, we came up with a methodology that involves a few steps.

Step 1: Conceptual Mapping

So, how do we go from XML to RDF?

The first step is what we call conceptual mapping.

We define each XML field’s corresponding RDF representation and what subgraph should exist to capture that data point meaningfully.

For example, a single XML field might translate into:

  • An instance of a specific OWL class
  • Linked by a property that describes the relationship
  • Pointing to a literal value or another resource

That is just one small piece of the RDF graph. Moreover, we do this for every field. In many cases, a single field can map to multiple fragments, meaning that 700 XML fields can easily result in 1,000 or more RDF mappings.

To help us to easier to manage and review these mappings, we created a lightweight visualization tool for internal use. It allows us to select a field and view the associated RDF structure, how nodes are connected, what properties are involved, and so on. This process is beneficial when reviewing mappings with domain experts and ontology specialists.

Step 2: RML Mapping (Technical)

Once the conceptual mapping is ready, we move on to the technical mapping using RML, the RDF Mapping Language.

RML allows us to define how data from structured formats such as XML, CSV, JSON, or relational databases is converted into RDF, which is our focus.

We organize the mappings into mapping packages grouped by schema version, document type, and ontology version. Each package contains multiple .rml.ttl files, which can be extensive, sometimes hundreds of lines each. For instance, one mapping file created for a specific group of fields exceeded 700 lines. When we add up all the mappings across all versions and types, we look at thousands of lines of code, and every line matters. One small typo, or a missed field, can break the transformation or lead to inaccurate RDF output.

The complexity here is both in volume and precision.

Step 3: Validation

After the technical mappings are in place, we validate the output in two main ways:

Conformity with the conceptual mapping

We compare the RDF generated from the XML with the graph fragments we initially defined. Does the RDF match what we intended conceptually?

Shape constraint validation against the SHACL shapes provided for the ontology

We use a SHACL file that defines the expected structure of the RDF according to the domain ontology. Thus, we test whether the data is valid RDF and fits the predicted shape.

This double-check ensures that:

  • The technical mappings do what they are supposed to
  • The output is semantically and structurally correct

All this shows just how complex and wide the scope is.

The Mapping Is Only One Part of a Bigger Workflow

This mapping work does not live in isolation.

We have built a pipeline that uses these mapping packages to transform structured XML data into RDF. After running through the pipeline, each source file becomes a separate RDF graph.

The RDF output is uploaded into a central platform that functions as a large knowledge graph. From there, it can be queried using SPARQL, integrated with other datasets, or reused in various applications.

Therefore, the mapping is a task and a foundational piece in a larger system. It enables structured, semantically meaningful data to be accessible, queryable, and reusable at scale.

Final Notes

This type of work requires careful attention to detail. It is not something that can be automated once and then forgotten. The software development kit (SDK) continually evolves, adding new fields and versions.

Therefore, we pay special attention to maintain all aspects, including the mappings, validations, and compatibility, so that the RDF generated from the XML documents provides the most complete and precise semantics that can be expressed with a certain version of the ontology.

Contributed by Meaningfier Csongor Nyulas.

___

Meaningfy continues to support the European Commission’s initiatives, leading the charge toward a transparent, efficient, and interconnected European public sector. If you represent a European Institution or a public company that needs to implement an interoperability solution, contact us for tailored support.

0 Comments