KGValidatePy is a Python library for validating knowledge graphs on top of pySHACL.
It provides DataOps-oriented approched that can support both ModelOps and SchemaOps workflows.
KGValidatePy extends the capabilities of pySHACL and integrates tightly with KGraphPy, providing structured validation pipelines for standards-based knowledge graph data, including Common Information Model (CIM) / Common Grid Model Exchange Standard (CGMES) confomred datasets.
The library is designed to support:
-
Constraint validation
-
Schema validation
-
Reasoning-assisted validation
-
Structured validation reporting
-
Integration with editing and verification workflows
KGValidatePy builds on top of KGraphPy-based (RDFLib) and pySHACL graph handling while remaining implementation-neutral at the interface level.
KGValidatePy focuses on validation of RDF-based knowledge graphs where:
-
RDF is the canonical representation
-
Constraints are expressed via SHACL or related mechanisms
-
Data quality must be enforced in automated pipelines
-
Validation results must be actionable in editing or verification workflows
While CIM / CGMES is the primary initial focus, KGValidatePy is reusable across other standards and profiles, including IEC 61850, BIM/IFC, GeoSPARQL datasets, EU SEMIC profiles, and Industrial Data Ontology (IDO).
KGValidatePy provides:
-
SHACL-based constraint validation (via pySHACL)
-
Optional reasoning support (RDFS / OWL profiles where applicable)
-
Validation of named graphs and dataset-level metadata
-
Structured validation reports suitable for:
-
Machine processing
-
Human-readable summaries
-
UI integration (e.g., KGEditPy)
In addition to SHACL validation, KGValidatePy may support:
-
Schema-level validation where RDF is the canonical content:
-
XML Schema (XSD) for interchange constraints
-
JSON Schema for derived artefacts
-
Avro schema validation for event or stream representations
-
Validation of packaging conventions (e.g., header consistency)
-
DifferenceSet validation in change-oriented workflows
These extensions are designed to complement, not replace, SHACL-based validation.
KGValidatePy is intentionally composable:
-
KGraphPy provides graph handling and packaging support
-
pySHACL provides the SHACL validation engine
-
LinkML may provide schema definitions and generated artefacts
-
KGVerifyPy may use validation results as part of executable verification workflows
-
KGEditPy may present validation findings for human correction
KGValidatePy does not reimplement SHACL validation logic; it orchestrates and extends it for DataOps (including SchemaOps and ModelOps) use cases.
Make sure the latest python is installed. It is recommended to install the library in a virtual environment.
python -m venv .venv
# Or
py -m venv .venv
.venv\Scripts\activate
pip install -e .Open the gui by doubleclicking run_gui.bat. It will assume the existence of a virtual environment called .venv. The gui can be opened from the terminal like this:
python main.pyAdd one or more data files and a SHACL file using the respective browse buttons.
For expansion of the data graphs with inferred (implicit) data, one or more RDFS files can be loaded.
The graphs are expanded using the owlrl library, like this:
DeductiveClosure(RDFS_Semantics).expand(graph)The RDFS graph is then added into the validation process done by pySHACL.
Datatypes can be added to the graph by checking the add datatypes box. If a custom context is not provided a default context is automatically used. The default context is specific to the CIM model and is therefore not applicable for non-CIM data.
Note concerning namespaces in the context:
The default context uses the standard namespaces for cim and eu:
- cim: https://cim.ucaiug.io/ns# - eu: https://cim.ucaiug.io/ns/eu#
However, these namespaces are also allowed because they have the same datatype information:
- cim: http://iec.ch/TC57/CIM100# - eu: http://iec.ch/TC57/CIM100-EuropeanExtension/1/0#
KGVerifyPy will automatically switch the namespaces in the context when these are detected in the graph.
Namespaces are a common stumbling block when working with CIM data. The namespaces used by SHACL shapes to find targeted triples must exactly match the namespaces in the data graph. And the same is true with the RDFS graph. To make it easier to find out if any namespaces do not match, a Check namespaces button has been provided. This compares the namespaces in the two graphs (and the RDFS graph if present) and lists any that is not found in the other graph(s), with prefixes.
A window will give a summary of the result, including information about the number of SHACL shapes found in the SHACL file, the number of SHACL shapes that had explicit targets in the data graph, and the number of triples in the data graph.
If no violations are found "Conforms: True" is shown.
If violations are found a short summary is given of what kind of violations there where and how many of each. The full result is written to a graph file.
A CSV report can optionally be saved. This is a reorganised version of the result graph which can be viewed in Excel. The columns subject_uuid, result_path and object refer to the triple where the violation was flagged. The column called result_path usually only holds the predicate of the triples, but sometimes it will contain both a predicate and object. This could be referring to the triple that has been flagged (e.g. if object is N/A the result_path shows a predicate that is missing from the graph), or it could be referring to a path related to the object (e.g. the object of the flagged triple is of the wrong type).
KGValidatePy is released under the Apache License, Version 2.0.
This permits open use, modification, and redistribution in research, public sector, and commercial contexts under the terms of the license.
The Statnett classification of this library is K0. Open code.
KGValidatePy builds upon the excellent work of the pySHACL project and its contributors.
-
pySHACL: https://github.com/RDFLib/pySHACL
We explicitly acknowledge the RDFLib and pySHACL communities for providing the foundational SHACL validation capabilities that make this library possible.
KGValidatePy extends these capabilities for standards-based DataOps workflows but remains an independent open-source project.