Matrix improvements#48
Open
matajoh wants to merge 1 commit into
Open
Conversation
A `Matrix` gather/reduce release. The dense matrix type gains top-k selection, along-axis gather/scatter, interleaved repetition, a lazy element iterator, masked reductions, and trigonometric/sign ufuncs, while the random-number factories move to an independent per-interpreter generator. A crash in the noticeboard mutator thread when reconstructing Matrix-valued entries is fixed, a family of allocation-overflow and cross-interpreter acquisition guards harden the C core, and passing an empty cown-list group to ``@when`` no longer crashes or drops a behavior parameter. One breaking change: the ``in_place`` flag on the unary element-wise methods is now keyword-only. **New Features** - **`Matrix.topk(k, axis=None, largest=True, where=None, as_matrix=False)`** — the *k* extreme elements per reduction group, in sorted order, returning a ``(values, indices)`` tuple. NumPy tie-break (first occurrence wins, NaN sorts last); a masked group shorter than *k* pads with ``NaN`` values and ``-1`` indices. Indices come back as Python lists by default (a flat ``list[int]`` for ``axis=None``, a list of per-group lists for an axis) so they feed straight into fancy indexing; ``as_matrix=True`` returns a same-shape index :class:`Matrix` instead. - **`Matrix.take_along_axis` / `Matrix.put_along_axis`** — the ``np.take_along_axis`` / ``np.put_along_axis`` gather and scatter, one index per row (``axis=1``) or per column (``axis=0``). ``take_along_axis`` accepts a keyword-only ``out=`` target; ``put_along_axis`` has an ``accumulate=True`` mode. Pairs directly with :meth:`Matrix.argmin` / :meth:`Matrix.argmax`. - **`Matrix.repeat_interleave(repeats, axis=None)`** — the ``np.repeat`` / ``torch.repeat_interleave`` interleaved (not tiled) copy of each element, row, or column. - **`Matrix.values()`** — a lazy row-major iterator over every element, holding a strong reference to its source and re-checking cown acquisition on each step. - **`Matrix.full(size, value)`** — a constant-filled constructor alongside :meth:`Matrix.zeros` / :meth:`Matrix.ones`. - **`Matrix.sign` / `Matrix.cos` / `Matrix.sin`** — element-wise ufuncs (``sign`` maps ±0.0 and NaN to ``0``), each with keyword-only ``in_place=`` / ``out=``. - **Masked reductions** — ``sum`` / ``mean`` / ``min`` / ``max`` / ``magnitude`` / ``magnitude_squared`` and ``argmin`` / ``argmax`` accept a ``where=`` same-shape mask matrix; only cells whose mask is non-zero (NaN counts as included) are considered. An all-excluded group publishes an op-specific sentinel: ``0`` for the additive ops, ``NaN`` for ``mean`` / ``min`` / ``max`` (matching NumPy's empty-slice semantics), and ``-1`` for ``argmin`` / ``argmax``. - **Axis-wise `argmin` / `argmax`** — ``axis=0`` / ``axis=1`` return the per-column / per-row extreme positions as a Python ``list[int]`` (directly usable in fancy indexing), or a :class:`Matrix` vector with ``as_matrix=True``. ``axis=None`` still returns a single ``int``. **Improvements** - **Allocation-free `out=` on gathers** — :meth:`Matrix.take` and :meth:`Matrix.take_along_axis` accept a keyword-only ``out=`` :class:`Matrix` to write into, rejected if it aliases the source (a reordering gather would read cells it had already overwritten). - **`Matrix.clip` gains `in_place=` / `out=`** — mutate in place or write into a caller-supplied matrix, mutually exclusive. - **Slice subscripts always return a `Matrix`** — a slice anywhere in the key (even a length-1 one, e.g. ``m[0:1, 0:1]``) keeps the result a :class:`Matrix`; only an all-integer key collapses to a ``float`` scalar. - **Reproducible, race-free Matrix RNG** — :meth:`Matrix.normal` / :meth:`Matrix.uniform` / :meth:`Matrix.seed` now draw from a per-interpreter splitmix64 generator held in the module state instead of the process-global C ``rand()`` / ``srand()``. Each worker sub-interpreter gets an independent stream, so ``seed()`` is reproducible within an interpreter and concurrent draws are no longer a data race on non-glibc or free-threaded builds. **Bug Fixes** - **Noticeboard Matrix-entry segfault** — the noticeboard mutator thread (a second OS thread in the primary interpreter that never ran the ``_math`` module init) crashed dereferencing a NULL cached module-state pointer while reconstructing a Matrix-valued noticeboard entry during a ``notice_update`` snapshot. The main interpreter's module state is now published at module exec (``MAIN_MATH_STATE``) and resolved on that thread; a regression test exercises both the XIData-reconstruction and pickle paths on the mutator thread. - **`repeat_interleave` integer-overflow → out-of-bounds heap write** — the overflow guard bounded only the repeated dimension, so a large ``repeats`` with a small repeated axis could wrap the ``rows * columns * repeats`` product and under-allocate. The guard now bounds the total element count, and ``impl_new`` re-checks the ``rows * columns`` product as a backstop for every constructor; both raise ``OverflowError`` instead of corrupting the heap. - **`Matrix.allclose` missing acquisition guard** — the only data-touching method that dereferenced both operands' buffers without an ``impl_check_acquired`` check now raises ``RuntimeError`` on a released or cross-interpreter operand instead of reading memory it does not own. - **NULL-deref under memory pressure** — the ``Matrix`` subscript / item paths now check the ``impl_new`` result before use. - **Empty cown-list `@when` group crashed or dropped a parameter** — passing an empty list to :func:`when` (e.g. ``@when(a, [])`` or ``@when([], b)``) dropped the corresponding behavior parameter: a trailing empty group raised ``TypeError`` (argument tuple too small) and a leading or middle empty group left a ``NULL`` argument slot that segfaulted the call. Empty groups now reconstruct as ``[]`` at their correct parameter slot, and the low-level ``BehaviorCapsule`` constructor validates the ``group_id`` / slot-count relationship so a malformed capsule can no longer drive an out-of-bounds argument-tuple write. **Breaking Changes** - **`in_place` is keyword-only on the unary element-wise methods** — ``ceil`` / ``floor`` / ``round`` / ``negate`` / ``abs`` / ``sqrt`` / ``sign`` / ``cos`` / ``sin`` no longer accept ``in_place`` positionally, matching :meth:`Matrix.clip`. Replace ``m.negate(True)`` with ``m.negate(in_place=True)``. **Documentation** - Expanded the :doc:`api` matrix surface via the ``__init__.pyi`` stub docstrings for the new methods and parameters, including ``@overload`` signatures for the polymorphic ``topk`` / ``argmin`` / ``argmax`` return types. The :class:`Matrix` class docstring now states the output-routing convention (which methods offer ``out=`` vs ``in_place=``) and the masked-reduction empty-group sentinel table in one place. **Tests** - Added golden and fuzzed coverage for ``topk`` (list and matrix index forms), ``take_along_axis`` / ``put_along_axis``, ``repeat_interleave`` (including the overflow regression), masked reductions and arg-reductions, the ``values()`` iterator, ``sign`` / ``cos`` / ``sin``, slice-forces-matrix, and the ``allclose`` acquisition guard. Added ``TestFactoriesOnWorker`` to exercise the per-interpreter RNG from inside ``@when`` behaviors, and a ``TestNoticeboardMatrixEntry`` regression class for the mutator-thread fix. **Internal** - Reworked the RNG onto a per-interpreter splitmix64 state seeded at module exec, and stopped caching the published main-interpreter state into a cold thread's thread-local so a torn-down module surfaces a clean ``RuntimeError`` rather than a dangling pointer. - De-duplicated the three ``topk`` axis branches into a single group loop behind ``topk_gather_group`` / ``topk_write_group`` / ``topk_pack_result``, and extracted the shared gather output-allocation + alias-check block (``gather_output``) used by the whole-axis and along-axis gathers. Signed-off-by: Matthew A Johnson <matthew@matthewajohnson.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A
Matrixgather/reduce release. The dense matrix type gains top-k selection, along-axis gather/scatter, interleaved repetition, a lazy element iterator, masked reductions, and trigonometric/sign ufuncs, while the random-number factories move to an independent per-interpreter generator. A crash in the noticeboard mutator thread when reconstructing Matrix-valued entries is fixed, and a family of allocation-overflow and cross-interpreter acquisition guards harden the C core. One breaking change: thein_placeflag on the unary element-wise methods is now keyword-only.New Features
Matrix.topk(k, axis=None, largest=True, where=None, as_matrix=False)— the k extreme elements per reduction group, in sorted order, returning a(values, indices)tuple. NumPy tie-break (first occurrence wins, NaN sorts last); a masked group shorter than k pads withNaNvalues and-1indices. Indices come back as Python lists by default (a flatlist[int]foraxis=None, a list of per-group lists for an axis) so they feed straight into fancy indexing;as_matrix=Truereturns a same-shape index :class:Matrixinstead.Matrix.take_along_axis/Matrix.put_along_axis— thenp.take_along_axis/np.put_along_axisgather and scatter, one index per row (axis=1) or per column (axis=0).take_along_axisaccepts a keyword-onlyout=target;put_along_axishas anaccumulate=Truemode. Pairs directly with :meth:Matrix.argmin/ :meth:Matrix.argmax.Matrix.repeat_interleave(repeats, axis=None)— thenp.repeat/torch.repeat_interleaveinterleaved (not tiled) copy of each element, row, or column.Matrix.values()— a lazy row-major iterator over every element, holding a strong reference to its source and re-checking cown acquisition on each step.Matrix.full(size, value)— a constant-filled constructor alongside :meth:Matrix.zeros/ :meth:Matrix.ones.Matrix.sign/Matrix.cos/Matrix.sin— element-wise ufuncs (signmaps ±0.0 and NaN to0), each with keyword-onlyin_place=/out=.sum/mean/min/max/magnitude/magnitude_squaredandargmin/argmaxaccept awhere=same-shape mask matrix; only cells whose mask is non-zero (NaN counts as included) are considered. An all-excluded group publishes an op-specific sentinel:0for the additive ops,NaNformean/min/max(matching NumPy's empty-slice semantics), and-1forargmin/argmax.argmin/argmax—axis=0/axis=1return the per-column / per-row extreme positions as a Pythonlist[int](directly usable in fancy indexing), or a :class:Matrixvector withas_matrix=True.axis=Nonestill returns a singleint.Improvements
out=on gathers — :meth:Matrix.takeand :meth:Matrix.take_along_axisaccept a keyword-onlyout=:class:Matrixto write into, rejected if it aliases the source (a reordering gather would read cells it had already overwritten).Matrix.clipgainsin_place=/out=— mutate in place or write into a caller-supplied matrix, mutually exclusive.Matrix— a slice anywhere in the key (even a length-1 one, e.g.m[0:1, 0:1]) keeps the result a :class:Matrix; only an all-integer key collapses to afloatscalar.Matrix.normal/ :meth:Matrix.uniform/ :meth:Matrix.seednow draw from a per-interpreter splitmix64 generator held in the module state instead of the process-global Crand()/srand(). Each worker sub-interpreter gets an independent stream, soseed()is reproducible within an interpreter and concurrent draws are no longer a data race on non-glibc or free-threaded builds.Bug Fixes
_mathmodule init) crashed dereferencing a NULL cached module-state pointer while reconstructing a Matrix-valued noticeboard entry during anotice_updatesnapshot. The main interpreter's module state is now published at module exec (MAIN_MATH_STATE) and resolved on that thread; a regression test exercises both the XIData-reconstruction and pickle paths on the mutator thread.repeat_interleaveinteger-overflow → out-of-bounds heap write — the overflow guard bounded only the repeated dimension, so a largerepeatswith a small repeated axis could wrap therows * columns * repeatsproduct and under-allocate. The guard now bounds the total element count, andimpl_newre-checks therows * columnsproduct as a backstop for every constructor; both raiseOverflowErrorinstead of corrupting the heap.Matrix.allclosemissing acquisition guard — the only data-touching method that dereferenced both operands' buffers without animpl_check_acquiredcheck now raisesRuntimeErroron a released or cross-interpreter operand instead of reading memory it does not own.Matrixsubscript / item paths now check theimpl_newresult before use.Breaking Changes
in_placeis keyword-only on the unary element-wise methods —ceil/floor/round/negate/abs/sqrt/sign/cos/sinno longer acceptin_placepositionally, matching :meth:Matrix.clip. Replacem.negate(True)withm.negate(in_place=True).Documentation
apimatrix surface via the__init__.pyistub docstrings for the new methods and parameters, including@overloadsignatures for the polymorphictopk/argmin/argmaxreturn types. The :class:Matrixclass docstring now states the output-routing convention (which methods offerout=vsin_place=) and the masked-reduction empty-group sentinel table in one place.Tests
topk(list and matrix index forms),take_along_axis/put_along_axis,repeat_interleave(including the overflow regression), masked reductions and arg-reductions, thevalues()iterator,sign/cos/sin, slice-forces-matrix, and theallcloseacquisition guard. AddedTestFactoriesOnWorkerto exercise the per-interpreter RNG from inside@whenbehaviors, and aTestNoticeboardMatrixEntryregression class for the mutator-thread fix.Internal
RuntimeErrorrather than a dangling pointer.topkaxis branches into a single group loop behindtopk_gather_group/topk_write_group/topk_pack_result, and extracted the shared gather output-allocation + alias-check block (gather_output) used by the whole-axis and along-axis gathers.