Validate interleave indices before dispatch#10175
Conversation
| return Ok(new_empty_array(data_type)); | ||
| } | ||
|
|
||
| validate_interleave_indices(values, indices)?; |
There was a problem hiding this comment.
I think the cost of this separate pass will be quite high, other kernels don't do this (or make it optional).
There was a problem hiding this comment.
That makes sense. Would you prefer making this optional, similar to take bounds checking, or keeping the fast path unchanged and documenting that invalid indices may panic?
|
run benchmark interleave_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fix-interleave-index-validation (1bb432c) to c8eba1a (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
What changed
interleave(values, indices)now validates every user-provided(array_idx, row_idx)before dispatching to specialized implementations. It returnsArrowError::InvalidArgumentErrorwhen an index references a missing input array or a row past the selected array length.The regression tests cover an out-of-bounds array index, an out-of-bounds row index, and the
StringViewArrayspecialized path that previously could reach direct view indexing.Why
interleaveis a public API, and invalid external indices should be reported as Arrow errors instead of reaching specialized kernels that can panic while indexing arrays or byte views.Validation
cargo fmt --package arrow-selectcargo test -p arrow-select out_of_bounds