Adding append and overwrite options to ParticleFile API#2655
Open
erikvansebille wants to merge 3 commits into
Open
Adding append and overwrite options to ParticleFile API#2655erikvansebille wants to merge 3 commits into
erikvansebille wants to merge 3 commits into
Conversation
Comment on lines
+64
to
+68
| if_exists : {"error", "overwrite", "append"}, optional | ||
| Behavior when the output file already exists. | ||
| - "error" (default): raise a ValueError. | ||
| - "overwrite": remove the existing file before writing. | ||
| - "append": preserve existing rows and append new rows. |
Contributor
There was a problem hiding this comment.
I think we should stick closer to convention here
Suggested change
| if_exists : {"error", "overwrite", "append"}, optional | |
| Behavior when the output file already exists. | |
| - "error" (default): raise a ValueError. | |
| - "overwrite": remove the existing file before writing. | |
| - "append": preserve existing rows and append new rows. | |
| mode : {"w", "a", None}, optional | |
| Writing behaviour. | |
| - None (default): Write dataset, and raise an error if it already exists. | |
| - "w": Write dataset, overwriting it. | |
| - "a": Append to dataset. |
also rename ._if_exists to ._mode
Comment on lines
+165
to
+180
| self._tmp_path = self.path.with_name(f"{self.path.stem}.append_tmp{self.path.suffix}") | ||
| if self._tmp_path.exists(): | ||
| self._tmp_path.unlink() | ||
|
|
||
| self._writer = pq.ParquetWriter(self._tmp_path, existing_schema, compression=self._compression) | ||
|
|
||
| # Parquet can't directly append, so we need to rewrite the existing data along with the new data. | ||
| for batch in existing_file.iter_batches(): | ||
| self._writer.write_table(pa.Table.from_batches([batch], schema=existing_schema)) | ||
| else: | ||
| assert not self.path.exists(), "If the file exists, the writer should already be set" | ||
| self._writer = pq.ParquetWriter( | ||
| self.path, | ||
| schema, | ||
| compression=self._compression, | ||
| ) |
Contributor
There was a problem hiding this comment.
Just taking a step back here - why do we need an append mode for the ParticleFile? Could users easily just create multiple particlefiles and join them into one after the fact?
Contributor
There was a problem hiding this comment.
I think especially since the file format of Parquet doesn't support this, and calling "append" would require rewriting the current data that we have, is maybe an indication that we either shouldn't have append or should consider something else.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds an option to
ParticleFile.__init__to control when the file already exists: either raise an error (default), overwrite, or append to the exisiting file.Note that in the case of "append", the t=0 should not be written in
pset.execute(), as it was already written at the end of the previouspset.execute()`Checklist
mainfor normal development,v3-supportfor v3 support)AI Disclosure
I used Claude code to help with the implementation of the append option in particlefile.write()