add(load): EP approval record migration stream by zubeydecivelek · Pull Request #541 · CERNDocumentServer/cds-migrator-kit

zubeydecivelek · 2026-06-25T10:26:47Z

needs https://github.com/CERNDocumentServer/cds-rdm/tree/feature/ep-approval

This PR covers the migration of EP approved records.

We'll have a new loader (CDSEPApprovalRecordServiceLoad) for records that went through EP approval. It splits one legacy record into two RDM records, creates the EP approval request with the original history (submitter, approver, dates, report number), and links everything together with related identifiers and an APPRN.

Metadata split

EP report number (CERN-EP-YYYY-*): removed from both splits before load. After the request is created, the APPRN is minted on the restricted record and added to the public record metadata.
EP draft report number (CERN-EP-*): removed from public record before load.
DOI: removed from the restricted split only.
Access: public record is set to public; restricted record stays restricted.
Community: CERN Scientific community is added to the public record parent.
Legacy identifiers / side effects: legacy recid minting, original dump, CLC sync, and record-state logging only run on the public record.

Files split

Restricted record: EPPHAPP_FILE(restricted file for EP) files only.
Public record: all other files (and it must not contain any remaining restricted files).

Loader

The loader creates 2 records and links them via:

an accepted EP approval request (submitted + accepted timeline from legacy history)
related identifiers (isversionof / isvariantformof)
APPRN on the public record

How to run

Records with ep_approval cannot use the standard record stream it'll raise a migration error. They must be migrated with --ep-approval.

invenio migration run --collection <collection-name> --ep-approval --dry-run

Result

For now this only handles records with a approved EP approval history (exactly one waiting and one approved entry).

Questions

If we also need to migrate records with waiting or rejected state, we would need a different split/load mechanism. Is this a case?
This record doesn’t have the epphapp restricted file, how do we split? we need restricted EPPHAPP_FILE https://cds.cern.ch/record/2864686/files/

0einstein0 · 2026-06-29T09:41:00Z

+                self.ep_approval_metadata["resource_type"],
+            )
+            # Add the APPRN identifier to the public record
+            self._sync_public_apprn_identifier(public_recid, report_number)


Is there a reason we do it after publishing the record?

I think I have the same questions with @0einstein0 :) Why not applying here the same order as it would happen in real life? Create restricted -> Create/Approve request -> Create public record ?

@0einstein0 no we can add this before publishing thanks :)

@zzacharo We can apply the same order, but we can't use the create public record method in the rdm implementation since we'll have different metadata.

I'll update the PR :)

0einstein0 · 2026-06-29T12:12:32Z

+
+                current_version_files[key] = deepcopy(file_data)
+
+            if not current_version_files:


if there are no new file but metadata changes in a version, do we skip those?

It was already implemented like this, we create versions if there's a file version change. Since I dont change the version creation(from files) during transform, and I split the files here I added this to avoid duplicated versions.

zzacharo · 2026-06-29T12:59:08Z

+            "mint_legacy_recid": True,
+            "save_original_dump": True,
+            "clc_sync": True,
+            "record_state": True,


what does that mean? to generate or not the record state?

it means we'll not add the restricted record to record_state_logger, i can change the variable name if it's confusing

zzacharo · 2026-06-29T13:20:22Z

+
+            # TODO: What if there are multiple experiments?
+            experiments = record_json.get("custom_fields", {}).get(
+                "cern:experiments", []


shall we raise and identify if there is any case?

zzacharo · 2026-06-29T13:22:24Z

+            )
+
+            # Load the EP approval request
+            self._load_ep_approval(restricted_state, public_state, legacy_recid=recid)


Suggested change

self._load_ep_approval(restricted_state, public_state, legacy_recid=recid)

self._create_ep_approval(restricted_state, public_record_state, legacy_recid=recid)

zzacharo · 2026-06-29T13:29:18Z

+                self.ep_approval_metadata["resource_type"],
+            )
+            # Add the APPRN identifier to the public record
+            self._sync_public_apprn_identifier(public_recid, report_number)


I think I have the same questions with @0einstein0 :) Why not applying here the same order as it would happen in real life? Create restricted -> Create/Approve request -> Create public record ?

zzacharo · 2026-06-29T13:35:59Z

+        for _, version_data in split.get("versions", {}).items():
+            current_version_files = OrderedDict()
+
+            for key, file_data in version_data.get("files", {}).items():


Thinking....what about if we just copy all files in the restricted record unconditionally and keep only the public ones for the public record? if an EPPHAPP_FILE_TYPE exsits then we could use this restriction for the restricted record for all files otherwise just restricted files i.e members of the community. That will solve also the record you found that didnt have any restricted file. wdyt @kpsherva ?

zzacharo · 2026-06-29T13:36:44Z

+    "8dfea666-5758-4614-bbc1-56209565c78a": {
+        "label": "EP approval",  # shown in UI buttons/headings
+        "referee_group": "cds-ph-ep-publication",  # CERN e-group slug
+        "report_number_pattern": "CERN-EP-{year}-{seq:03d}",


this one has changed in the latest implementation on https://github.com/CERNDocumentServer/cds-rdm/tree/feature/ep-approval

zubeydecivelek force-pushed the ep-approval-load branch 2 times, most recently from 4619216 to dc83813 Compare June 25, 2026 13:49

zubeydecivelek moved this to In review 🔍 in Sprint Q2 2026 ☀️ Jun 26, 2026

zubeydecivelek added this to Sprint Q2 2026 ☀️ Jun 26, 2026

zubeydecivelek force-pushed the ep-approval-load branch from dc83813 to cf8176b Compare June 26, 2026 14:57

add(load): EP approval record migration stream

6de53c6

zubeydecivelek force-pushed the ep-approval-load branch from cf8176b to 6de53c6 Compare June 26, 2026 14:58

zubeydecivelek requested review from 0einstein0, kpsherva and zzacharo and removed request for 0einstein0 and kpsherva June 29, 2026 08:51

0einstein0 reviewed Jun 29, 2026

View reviewed changes

zzacharo reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add(load): EP approval record migration stream#541

add(load): EP approval record migration stream#541
zubeydecivelek wants to merge 1 commit into
CERNDocumentServer:masterfrom
zubeydecivelek:ep-approval-load

zubeydecivelek commented Jun 25, 2026 •

edited

Loading

Uh oh!

0einstein0 Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

zubeydecivelek Jun 29, 2026

Uh oh!

0einstein0 Jun 29, 2026

Uh oh!

zubeydecivelek Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

zubeydecivelek Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

zzacharo Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		current_version_files[key] = deepcopy(file_data)

		if not current_version_files:

	self._load_ep_approval(restricted_state, public_state, legacy_recid=recid)
	self._create_ep_approval(restricted_state, public_record_state, legacy_recid=recid)

Uh oh!

Conversation

zubeydecivelek commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Metadata split

Files split

Loader

How to run

Result

Questions

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zubeydecivelek commented Jun 25, 2026 •

edited

Loading