Problem
The Python code generation pipeline is spread across three repositories with overlapping responsibilities:
-
oold-python (src/oold/generator.py, src/oold/utils/codegen.py)
- Schema preprocessing:
range to $ref conversion
OOLDJsonSchemaParser: custom keyword preservation, _deep_merge override
- Calls
datamodel-code-generator with settings (use_title_as_name, reuse_model, allof_class_hierarchy)
-
osw-python (src/osw/core.py lines 540-900)
_fetch_schema(): recursive schema resolution, downloads from wiki/offline pages
- Writes resolved schemas to temp directory for
datamodel-code-generator
- Does NOT run oold preprocessing on dependency schemas
-
osw-python-package-generator (src/osw_python_package_generator/main.py)
- Downloads schema packages from GitHub
- Calls
osw.fetch_schema() then replace_duplicated_classes_with_imports()
- Post-processing: class deduplication (name-based, UUID-based, numbered variants, pass-only cleanup)
- Hotfix replacements for raw OSW ID type annotations
Current issues caused by this split
-
Dependency schemas not preprocessed: _fetch_schema() in osw-python fetches dependency schemas but doesn't run oold's preprocess() on them. This means range fields in dependency schemas (e.g., Device.risk_assessment) keep their "type": "string" alongside the unresolved range, causing datamodel-code-generator to use raw OSW IDs as class names instead of schema titles.
-
Post-processing workarounds: The package generator has growing hotfix logic (raw OSW ID replacement, sentinel object cleanup, lambda : formatting) that patches symptoms of upstream issues.
-
Duplicate merge logic: _deep_merge override in oold and merge_deep in json_tools.py exist to fix array deduplication. The fix had to be applied in oold because that's where datamodel-code-generator is monkey-patched, but the merge utility lives separately.
-
reuse_model failures: datamodel-code-generator's reuse_model=True doesn't deduplicate schemas resolved via different $ref paths (e.g., Tool referenced from both Process and ProcessType). This is worked around by post-processing in the package generator.
-
Ad-hoc schema generation rebuilds full chain: When a user wants to generate Python code for a single custom schema via osw-python, _fetch_schema() resolves the entire dependency chain up to Entity. Only the user's schema should be built; dependency classes should be imported from existing packages.
Proposed solution
Consolidate the generation pipeline in oold-python:
- Move schema resolution (
_fetch_schema logic) from osw-python into oold's Generator
- Run
preprocess() on every schema after resolution, not just top-level schemas
- Move class deduplication logic from the package generator into oold's
Generator as a post-processing step
- Support dependency resolution from installed packages: when resolving a
$ref to a schema that exists in an installed dependency package, import the class instead of regenerating it
- osw-python and the package generator become thin wrappers that provide schema sources (wiki API, GitHub zips, local files) and call
oold.Generator.generate()
This would make oold-python the single source of truth for: schema preprocessing, code generation settings, post-processing, and deduplication.
Affected repositories
Problem
The Python code generation pipeline is spread across three repositories with overlapping responsibilities:
oold-python (
src/oold/generator.py,src/oold/utils/codegen.py)rangeto$refconversionOOLDJsonSchemaParser: custom keyword preservation,_deep_mergeoverridedatamodel-code-generatorwith settings (use_title_as_name,reuse_model,allof_class_hierarchy)osw-python (
src/osw/core.pylines 540-900)_fetch_schema(): recursive schema resolution, downloads from wiki/offline pagesdatamodel-code-generatorosw-python-package-generator (
src/osw_python_package_generator/main.py)osw.fetch_schema()thenreplace_duplicated_classes_with_imports()Current issues caused by this split
Dependency schemas not preprocessed:
_fetch_schema()in osw-python fetches dependency schemas but doesn't run oold'spreprocess()on them. This meansrangefields in dependency schemas (e.g.,Device.risk_assessment) keep their"type": "string"alongside the unresolvedrange, causingdatamodel-code-generatorto use raw OSW IDs as class names instead of schema titles.Post-processing workarounds: The package generator has growing hotfix logic (raw OSW ID replacement, sentinel object cleanup,
lambda :formatting) that patches symptoms of upstream issues.Duplicate merge logic:
_deep_mergeoverride in oold andmerge_deepinjson_tools.pyexist to fix array deduplication. The fix had to be applied in oold because that's wheredatamodel-code-generatoris monkey-patched, but the merge utility lives separately.reuse_model failures:
datamodel-code-generator'sreuse_model=Truedoesn't deduplicate schemas resolved via different$refpaths (e.g.,Toolreferenced from bothProcessandProcessType). This is worked around by post-processing in the package generator.Ad-hoc schema generation rebuilds full chain: When a user wants to generate Python code for a single custom schema via osw-python,
_fetch_schema()resolves the entire dependency chain up to Entity. Only the user's schema should be built; dependency classes should be imported from existing packages.Proposed solution
Consolidate the generation pipeline in oold-python:
_fetch_schemalogic) from osw-python into oold'sGeneratorpreprocess()on every schema after resolution, not just top-level schemasGeneratoras a post-processing step$refto a schema that exists in an installed dependency package, import the class instead of regenerating itoold.Generator.generate()This would make oold-python the single source of truth for: schema preprocessing, code generation settings, post-processing, and deduplication.
Affected repositories