Skip to content

add(load): EP approval record migration stream#541

Open
zubeydecivelek wants to merge 1 commit into
CERNDocumentServer:masterfrom
zubeydecivelek:ep-approval-load
Open

add(load): EP approval record migration stream#541
zubeydecivelek wants to merge 1 commit into
CERNDocumentServer:masterfrom
zubeydecivelek:ep-approval-load

Conversation

@zubeydecivelek

@zubeydecivelek zubeydecivelek commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

needs https://github.com/CERNDocumentServer/cds-rdm/tree/feature/ep-approval

This PR covers the migration of EP approved records.

We'll have a new loader (CDSEPApprovalRecordServiceLoad) for records that went through EP approval. It splits one legacy record into two RDM records, creates the EP approval request with the original history (submitter, approver, dates, report number), and links everything together with related identifiers and an APPRN.

Metadata split

  • EP report number (CERN-EP-YYYY-*): removed from both splits before load. After the request is created, the APPRN is minted on the restricted record and added to the public record metadata.
  • EP draft report number (CERN-EP-*): removed from public record before load.
  • DOI: removed from the restricted split only.
  • Access: public record is set to public; restricted record stays restricted.
  • Community: CERN Scientific community is added to the public record parent.
  • Legacy identifiers / side effects: legacy recid minting, original dump, CLC sync, and record-state logging only run on the public record.

Files split

  • Restricted record: EPPHAPP_FILE(restricted file for EP) files only.
  • Public record: all other files (and it must not contain any remaining restricted files).

Loader

The loader creates 2 records and links them via:

  • an accepted EP approval request (submitted + accepted timeline from legacy history)
  • related identifiers (isversionof / isvariantformof)
  • APPRN on the public record

How to run

Records with ep_approval cannot use the standard record stream it'll raise a migration error. They must be migrated with --ep-approval.

invenio migration run --collection <collection-name> --ep-approval --dry-run

Result

For now this only handles records with a approved EP approval history (exactly one waiting and one approved entry).

Questions

  • If we also need to migrate records with waiting or rejected state, we would need a different split/load mechanism. Is this a case?
  • This record doesn’t have the epphapp restricted file, how do we split? we need restricted EPPHAPP_FILE https://cds.cern.ch/record/2864686/files/

@zubeydecivelek zubeydecivelek force-pushed the ep-approval-load branch 2 times, most recently from 4619216 to dc83813 Compare June 25, 2026 13:49
@zubeydecivelek zubeydecivelek moved this to In review 🔍 in Sprint Q2 2026 ☀️ Jun 26, 2026
@zubeydecivelek zubeydecivelek requested review from 0einstein0, kpsherva and zzacharo and removed request for 0einstein0 and kpsherva June 29, 2026 08:51
self.ep_approval_metadata["resource_type"],
)
# Add the APPRN identifier to the public record
self._sync_public_apprn_identifier(public_recid, report_number)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we do it after publishing the record?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have the same questions with @0einstein0 :) Why not applying here the same order as it would happen in real life? Create restricted -> Create/Approve request -> Create public record ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0einstein0 no we can add this before publishing thanks :)

@zzacharo We can apply the same order, but we can't use the create public record method in the rdm implementation since we'll have different metadata.

I'll update the PR :)


current_version_files[key] = deepcopy(file_data)

if not current_version_files:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are no new file but metadata changes in a version, do we skip those?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already implemented like this, we create versions if there's a file version change. Since I dont change the version creation(from files) during transform, and I split the files here I added this to avoid duplicated versions.

"mint_legacy_recid": True,
"save_original_dump": True,
"clc_sync": True,
"record_state": True,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does that mean? to generate or not the record state?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means we'll not add the restricted record to record_state_logger, i can change the variable name if it's confusing


# TODO: What if there are multiple experiments?
experiments = record_json.get("custom_fields", {}).get(
"cern:experiments", []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we raise and identify if there is any case?

)

# Load the EP approval request
self._load_ep_approval(restricted_state, public_state, legacy_recid=recid)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._load_ep_approval(restricted_state, public_state, legacy_recid=recid)
self._create_ep_approval(restricted_state, public_record_state, legacy_recid=recid)

self.ep_approval_metadata["resource_type"],
)
# Add the APPRN identifier to the public record
self._sync_public_apprn_identifier(public_recid, report_number)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have the same questions with @0einstein0 :) Why not applying here the same order as it would happen in real life? Create restricted -> Create/Approve request -> Create public record ?

for _, version_data in split.get("versions", {}).items():
current_version_files = OrderedDict()

for key, file_data in version_data.get("files", {}).items():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking....what about if we just copy all files in the restricted record unconditionally and keep only the public ones for the public record? if an EPPHAPP_FILE_TYPE exsits then we could use this restriction for the restricted record for all files otherwise just restricted files i.e members of the community. That will solve also the record you found that didnt have any restricted file. wdyt @kpsherva ?

"8dfea666-5758-4614-bbc1-56209565c78a": {
"label": "EP approval", # shown in UI buttons/headings
"referee_group": "cds-ph-ep-publication", # CERN e-group slug
"report_number_pattern": "CERN-EP-{year}-{seq:03d}",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one has changed in the latest implementation on https://github.com/CERNDocumentServer/cds-rdm/tree/feature/ep-approval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants