Skip to content

Adding append and overwrite options to ParticleFile API#2655

Open
erikvansebille wants to merge 3 commits into
Parcels-code:mainfrom
erikvansebille:particlefile_append_overwrite
Open

Adding append and overwrite options to ParticleFile API#2655
erikvansebille wants to merge 3 commits into
Parcels-code:mainfrom
erikvansebille:particlefile_append_overwrite

Conversation

@erikvansebille

Copy link
Copy Markdown
Member

Description

This PR adds an option to ParticleFile.__init__ to control when the file already exists: either raise an error (default), overwrite, or append to the exisiting file.

Note that in the case of "append", the t=0 should not be written in pset.execute(), as it was already written at the end of the previous pset.execute()`

Checklist

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.
    • Describe how you used it (e.g., by pasting your prompt):
      I used Claude code to help with the implementation of the append option in particlefile.write()

Comment on lines +64 to +68
if_exists : {"error", "overwrite", "append"}, optional
Behavior when the output file already exists.
- "error" (default): raise a ValueError.
- "overwrite": remove the existing file before writing.
- "append": preserve existing rows and append new rows.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick closer to convention here

Suggested change
if_exists : {"error", "overwrite", "append"}, optional
Behavior when the output file already exists.
- "error" (default): raise a ValueError.
- "overwrite": remove the existing file before writing.
- "append": preserve existing rows and append new rows.
mode : {"w", "a", None}, optional
Writing behaviour.
- None (default): Write dataset, and raise an error if it already exists.
- "w": Write dataset, overwriting it.
- "a": Append to dataset.

also rename ._if_exists to ._mode

Comment on lines +165 to +180
self._tmp_path = self.path.with_name(f"{self.path.stem}.append_tmp{self.path.suffix}")
if self._tmp_path.exists():
self._tmp_path.unlink()

self._writer = pq.ParquetWriter(self._tmp_path, existing_schema, compression=self._compression)

# Parquet can't directly append, so we need to rewrite the existing data along with the new data.
for batch in existing_file.iter_batches():
self._writer.write_table(pa.Table.from_batches([batch], schema=existing_schema))
else:
assert not self.path.exists(), "If the file exists, the writer should already be set"
self._writer = pq.ParquetWriter(
self.path,
schema,
compression=self._compression,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just taking a step back here - why do we need an append mode for the ParticleFile? Could users easily just create multiple particlefiles and join them into one after the fact?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think especially since the file format of Parquet doesn't support this, and calling "append" would require rewriting the current data that we have, is maybe an indication that we either shouldn't have append or should consider something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants