Skip to content

[pull] develop from lammps:develop#165

Merged
pull[bot] merged 52 commits into
comphy-lab:developfrom
lammps:develop
Jul 3, 2026
Merged

[pull] develop from lammps:develop#165
pull[bot] merged 52 commits into
comphy-lab:developfrom
lammps:develop

Conversation

@pull

@pull pull Bot commented Jul 3, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

akohlmey and others added 30 commits June 11, 2026 18:07
Move he_flag from PairEAMFS to PairEAM and dispatch on it in a shared
embedding_index() inline helper used by compute(), single(), and
compute_atomic_energy(). This makes PairEAMHE::compute() redundant (it
was a verbatim copy of PairEAM::compute() except for ~15 lines in the
embedding loop) and is a prerequisite for accelerated eam/he variants.

Results for eam, eam/alloy, eam/fs, eam/cd, and eam/he are unchanged:
verified bit-identical thermo output for serial and 4-rank MPI runs
with newton on/off, including bench/in.eam and examples/shear.

Exception (bug fix): PairEAM::single() and compute_atomic_energy()
previously evaluated the embedding term for eam/he without the rhomin
table offset and thus returned wrong values (e.g. through compute
group/group); they now use the correct eam/he embedding evaluation
including its two-sided linear extrapolation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Give PairEAM a fileformat selector (FUNCFL/SETFL/FS) and dispatch
coeff(), read_file(), and file2array() on it; the setfl and fs reader
and conversion implementations move verbatim from PairEAMAlloy and
PairEAMFS into the base class (read_setfl/read_fs, file2array_setfl/
file2array_fs), with the two formerly line-identical coeff() bodies
merged into one coeff_mapped(). The repeated setfl/fs cleanup code is
factored into delete_setfl()/delete_fs().

PairEAMAlloy, PairEAMFS, and PairEAMHE become constructor-only classes
that select the file format. This removes the need for virtual
inheritance: eam/alloy/opt and eam/fs/opt now derive from PairEAMOpt
alone instead of forming a diamond, and eam/cd no longer initializes a
virtual base. Class names, style names, and the virtual read_file()/
file2array() hooks are unchanged.

This is a pure restructuring: thermo output verified byte-identical
for eam, eam/alloy, eam/fs, eam/he, eam/cd plus eam/opt, eam/alloy/opt,
eam/fs/opt (serial and 4-rank MPI, newton on/off, bench/in.eam and
examples/shear included).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
With the setfl/fs readers in PairEAM behind the fileformat selector,
the verbatim copies of coeff()/read_file()/file2array() in
eam/alloy/omp, eam/fs/omp, eam/alloy/intel, and eam/fs/intel are
redundant: these classes are now constructor-only and inherit the
readers from PairEAM through their accelerated base class. The
virtual inheritance in these headers is dropped as well.

Thermo output verified byte-identical for all EAM gate runs with
-sf omp (2 threads) and -sf intel (mode double) plus plain and
-sf opt, serial and 4-rank MPI, newton on/off.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PairEAMGPU now holds function-pointer bindings to the per-style device
instance functions from the gpu library (init/clear/compute_n/compute/
compute_force/bytes), set in the constructor. eam/alloy/gpu and
eam/fs/gpu become constructor-only subclasses of PairEAMGPU that select
their file format (readers inherited from PairEAM) and re-point the
bindings to their own device instance; their verbatim copies of the
whole GPU wrapper and of the file readers are removed.

Thermo output verified byte-identical for all EAM gate runs with
-sf gpu (OpenCL backend) plus plain CPU runs, serial and 4-rank MPI,
newton on/off; full test suite (incl. GPU suffix pair tests) passes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
eam/alloy/kk and eam/fs/kk were each a near-verbatim ~1500-line copy
of pair_eam_kokkos.cpp (full functor set, comm packs, and pasted file
readers). With the readers in PairEAM behind the fileformat selector,
both collapse to constructor-only template subclasses of
PairEAMKokkos<DeviceType> that select their file format.

The copies had also drifted: the base eam/kk received the
KK_FLOAT/KK_ACC_FLOAT mixed-precision modernization (rdr_kk/rhomax_kk
members, explicit casts) that was never applied to the alloy/fs
copies. They now inherit the maintained implementation; for default
double-precision KOKKOS builds the results are unchanged.

Thermo output verified byte-identical for all EAM gate runs with
-k on t 1 -sf kk (serial backend) plus plain CPU runs, serial and
4-rank MPI, newton on/off. Note: the kk suffix sub-tests of
test_pair_style segfault in a host-only KOKKOS serial build both
before and after this change (also for untouched styles like born/kk
and sw/kk); this pre-existing issue is unrelated to this commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The unit conversion loops in read_setfl() skipped the last tabulated
point of the frho and z2r tables (j < nrho / k < nr instead of <=),
inconsistent with the funcfl and fs readers which convert the full
table. The effect is limited to the table end points (z2r at the
cutoff, frho at the maximum density) and stays within the tolerances
of the eam_alloy_real and eam_cd_real unit tests, which pass
unchanged.
Accumulate the density and force contributions of the outer-loop atom
i in local variables instead of in-place updates, as the /opt and /omp
variants already do. The accumulators are seeded with the current
array values, which keeps the summation order and thus the results
bit-for-bit identical (also for hybrid sub-styles).

The embedding evaluation loop moves into a compute_embedding() method
templated on the table format, and embedding_index() gains a templated
variant, so the eam/he format test is hoisted out of the loop and
resolved at compile time (the non-template overload dispatching on
he_flag remains for single() and compute_atomic_energy()).

Thermo output stays byte-identical for eam, eam/alloy, eam/fs, eam/he,
and eam/cd (serial and 4-rank MPI, newton on/off); interleaved
best-of-3 serial benchmarks (32k atoms, 100 steps) show 2-3% speedup
for all four file formats.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
fp[i] is constant over the inner neighbor loop but was re-loaded every
iteration since the compiler cannot prove the f[j] stores do not alias
the fp array; the numforce[i] per-neighbor increment becomes a local
counter stored once per atom i. Results are bit-for-bit identical.

Benchmarks (32k atoms, interleaved best-of-3, GCC/x86) show no
measurable timing change, i.e. the compiler already coped; the change
removes the dependence on alias analysis and matches the accumulator
pattern of the surrounding code.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A non-KOKKOS style running inside a KOKKOS run expects x2lamda()/lamda2x()
to update the coordinates it reads on the host, but the DomainKokkos
overrides transform only the Kokkos views. Whenever those views do not
alias the legacy atom->x array (host builds at single/mixed precision,
and every GPU build), a host kspace style with a triclinic box, e.g.
pppm/cg which has no /kk variant, reads stale Cartesian coordinates
against the lamda-space grid mapping after its in-compute x2lamda()
call and aborts with "Out of range atoms - cannot compute PPPM".
Setup escapes because VerletKokkos::setup() runs under auto_sync;
the timestep loop does not. This is the KSpaceStyle:pppm_cg_tri_slab
failure of the KOKKOS serial single/mixed precision CI jobs.

Add TransformView::need_sync_legacy() (is a Kokkos view newer than the
legacy host view?) and use it in the four DomainKokkos::x2lamda()/
lamda2x() array overloads: when the legacy host copy is current on
entry, sync it again on exit so the caller-visible host state stays
coherent. Device-resident flows (verlet_kokkos, fix_nh_kokkos,
min_kokkos, ...) enter with the host copy already stale and skip the
extra sync, so the per-step device path is unaffected. Double precision
host builds alias legacy and Kokkos views and were never affected; the
added sync folds to a no-op there.

The kokkos_omp (present since the test was added) and kokkos_gpu
(added 2026-06-28) entries in the pppm_cg_tri_slab skip list were this
same staleness in its other guises; remove both so the test locks the
fix in on all KOKKOS backends. Verified: full unittest suite green on
clean serial single/mixed/double builds, KSpace+FixTimestep green on
OpenMP single (kokkos_omp unskipped) and HIP gfx1103 double
(kokkos_gpu unskipped).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
FixPrecessionSpin::init() converted the Zeeman field intensity with
H_field *= gyro in place. Fix::init() runs once per run command, so
every additional run compounded the conversion and scaled the field
by another factor of gyro ~= 0.176 rad.THz/T: any multi-run input
using precession/spin zeeman sampled a different field in the second
and later runs, and a run continued from a restart file diverged from
the same run performed without interruption.

Store the converted intensity in a new member (hfield) instead, like
the anisotropy constants already do (Kah = Ka/hbar), so init() is
idempotent and H_field keeps the user-supplied value in Tesla.

Found by the new fix-timestep-precession_spin.yaml test: its restart
leg could never match the continuous reference trajectory. Verified
by comparing an uninterrupted 8-step run against a 4-step run plus
restart continuation, which are now bit-identical.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
PairSpinNeel::init_one() mirrored all eight coefficient arrays to the
transposed type pair but not cut_spin_neel[j][i], unlike all other
spin pair styles which mirror their cutoff array. With more than one
atom type, compute() and compute_single_pair() then compared rsq
against an uninitialized cutoff for half of the type pairs, producing
nondeterministic forces and energies.

Found by the new spin-pair-neel.yaml test failing intermittently
(9 out of 30 identical serial runs); valgrind --track-origins pointed
at the rsq <= local_cut2 tests. 30/30 stable and valgrind-clean after
the fix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
…s.cpp

Doxygen could not match the explicit instantiations of utils::bounds() and
utils::bounds_typelabel() nor the explicit specializations of utils::join<T>()
in utils.cpp to their template declarations in utils.h, emitting 13 "no
matching file member found" warnings. The two anon-namespace forward
declarations of re_match()/re_find() used /** */ doxygen comments without
\param/\return, emitting 4 more "parameters not documented" warnings.

Wrap the instantiation/specialization blocks in \cond ... \endcond so
doxygen skips them (their documentation lives on the utils.h template
declarations, which breathe resolves), and demote the two forward-declaration
comments to plain /* */. doc/doxygen-warn.log goes from 17 warnings to 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
akohlmey and others added 22 commits July 2, 2026 20:42
Replace the targeted fix from commit 237dab0 (keep legacy host
coords current in DomainKokkos::x2lamda/lamda2x) with the general
mechanism suggested by Stan Moore in review: when a style without
KOKKOS support runs inside a KOKKOS run, enable kokkos->auto_sync for
the duration of its compute so that every sync()/modified() it
triggers - for example through the virtual DomainKokkos
x2lamda/lamda2x overrides - writes modifications through to the
legacy host arrays the style reads and writes. This is the same
pattern ModifyKokkos already applies to every fix hook invocation and
it covers any host-visible per-atom data, not just coordinates.

Apply the wrapping to the pair, bond, angle, dihedral, improper, and
kspace compute invocations in VerletKokkos::run(),
MinKokkos::energy_force(), DynamicalMatrixKokkos::update_force(), and
ThirdOrderKokkos::update_force(). The setup paths already run with
auto_sync enabled and need no change. The KSpace base class was
missing the kokkosable member that all other style base classes have;
add it and set it in PPPMKokkos.

Remove the now superseded TransformView::need_sync_legacy() and the
conditional legacy sync from the DomainKokkos coordinate transforms.

Verified: KSpaceStyle:pppm_cg_tri_slab (which runs the non-KOKKOS
pppm/cg in a triclinic KOKKOS run and aborted before the original
fix) passes on KOKKOS serial single and mixed precision builds and on
a HIP GPU build; full unittest suite green on the serial single
precision build; KSpace and FixTimestep suites green on serial mixed
and double precision and HIP builds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
…mand

pair_compute_neighlist() enables the neigh_thread setting on GPUs for
systems with up to 16000 atoms when the pair style runs with newton
off, without marking it as user selected. That state persists in the
KokkosLMP class instance across a clear command, so when clear
re-applies the package defaults and command line settings, a package
command with "newton on" that was accepted at startup now aborts with
"Must use 'newton off' with KOKKOS package option 'neigh/thread on'"
even though the user never enabled neigh/thread. To reproduce: run a
small system with newton off on a GPU, issue a clear command, and set
up a newton on run in the same session.

Reset the setting at the top of the package command processing unless
it was selected with the neigh/thread option; the heuristic
re-evaluates it on the next pair style compute anyway.

Found by activating the newton off legs of the kokkos_gpu force-style
tests: all 76 yaml files with KOKKOS-capable pair styles aborted in a
later test stage after the newton off leg had triggered the heuristic.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
Add support for magnetic (atom_style spin) systems to the pair style
and fix timestep test drivers, following the model of the ellipsoid
special case:

- new yaml blocks init_mag_forces/run_mag_forces (per-atom fm) and
  run_spin (per-atom sp incl. norm), captured automatically when the
  atom style provides them and compared wherever forces (pair tester)
  and positions/velocities/torques (fix tester) are compared
- new 'timestep' yaml keyword so the fix tester can use a timestep
  suitable for spin dynamics in metal units instead of the previous
  hard-coded 0.25 (which remains the default)
- a 'spin' tag: such yaml files must define fix nve/spin in their
  post_commands (the spin pair styles compute mechanical forces only
  when the fix is present), so the pair tester run phase does not add
  another time integrator; newton off legs are skipped via the
  established newton_pair pre_commands pin since spin pair styles
  require newton on
- remove the hard-coded MOLECULE package gate in the fix tester;
  the atom style prerequisites in every yaml file already cover it
- new inputs in.spin, data.spin (54-atom perturbed bcc iron cell with
  randomized spins, two atom types), coeffs.spin
- pair style references for spin/exchange, spin/exchange/biquadratic,
  spin/dmi, spin/neel, spin/magelec, spin/dipole/cut, and
  hybrid/overlay eam/fs + spin/exchange; fix references for nve/spin,
  precession/spin, and setforce/spin, registered as SpinPairStyle:*
  and FixTimestep:* ctest cases

These tests found the fix precession/spin zeeman compounding bug and
the pair spin/neel uninitialized cutoff bug fixed in the two previous
commits. No test for fix langevin/spin yet: it does not store the RNG
state in restart files, so its restart trajectory cannot match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
The newton off blocks in the omp leg and the shared kokkos test body
of test_pair_style checked "if (newton_pair == 0)" on the still
newton on instance and only re-initialized with newton off inside
that block, so the comparisons had been dead code since 2021 (the
plain leg re-initializes first and then checks whether the pair style
forced newton back on). Restructure both legs to match the plain leg.
Also correct the comparison of the freshly initialized state inside
the previously dead omp block, which compared the initial forces
against the run_forces reference, and remove a duplicated magnetic
force check that had slipped into the plain and kokkos legs.

With this change the newton off state of every pair style with an
/omp or /kk variant is compared against the reference data. Results
across all 394 pair and kspace yaml files: no failures in the omp and
kokkos_omp legs (OpenMP build), no failures in the kokkos_serial leg
(KOKKOS serial single precision build; the pre-existing pppm_tip4p_ad
failure is unrelated), and no failures in the kokkos_gpu leg (HIP
build) after the preceding fix for the stale neigh/thread state.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
The four std::sort comparators for the per-particle contact lists in
FixSurfaceGlobal::post_force() and PairSurfGranular::compute() took
their ContactSurf arguments by value. The struct holds three
std::vector members besides ~200 bytes of plain data, so every
comparison in these per-atom hot loops copied both arguments including
up to six heap allocations. Take them by const reference.

Found by Coverity Scan (CIDs 1661276, 1661387, 1661110, 1661564,
1661250, 1661984, 1661398, 1661585).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
The loop over the connections of a line's second endpoint iterated
with the connection count of the first endpoint, reading past the end
of neigh_p2[] whenever a line has more connections at endpoint 1 than
at endpoint 2 and crashing FixSurfaceGlobal::init() for 2d global
surfaces with per-molecule motion (e.g. the in.line.gran.global
example). The 3d twin block pairs its counts correctly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
The four sort call sites for per-particle contact lists in
FixSurfaceGlobal::post_force() and PairSurfGranular::compute() carried
duplicated lambda comparators encoding the same 6-level ordering
(overlap, priority, normal alignment, center-of-mass distance, index),
in two variants: before the connection pre-walk the normal signs are
not yet assigned and the alignment is compared in absolute value.
Move them into two static comparators on FixSurface implemented via
one shared function, so the pair style and the fix are guaranteed to
rank multi-contact configurations identically, and return bool instead
of int. Verified bit-identical on the tribox/line global and tri/line
local example runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Setting the fflag/nside/aflag (and ewhich/pwhich) connection flags
between two connected lines or triangles was implemented in ten nearly
identical blocks: one per endpoint (2d) or edge (3d) of the global and
local surface fixes, differing only in which endpoints form the edge
and which per-connection arrays are written. Move the shared
computation into two FixSurface methods, point_connection2d() and
edge_connection3d(); the callers keep their genuinely different
endpoint matching (point indices for global surfaces, coordinate
comparison for distributed local surfaces) and now report inconsistent
surface connectivity instead of leaving flags unset when no shared
point is found. The classification values (FLAT/NONFLAT,
CONCAVE/CONVEX, SAME_SIDE/OPPOSITE_SIDE) move from per-file enum
copies into the FixSurface class so the helpers and both subclasses
share one definition.

Verified bit-identical on the tribox/line global and tri/line local
example runs (only source line numbers in warning messages shift).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Extend the connection-flag consolidation to the third duplicated
family: the six corner blocks (three per file) that set the
cwhich/nside/fflag values for tris sharing only a corner point. The
shared computation (normal sides from the sign of the normal dot
product plus the flatness test) moves into
FixSurface::corner_connection3d(); the callers keep their differing
corner matching and now report inconsistent surface connectivity for
an unmatched corner instead of leaving cwhich unset. Also drop the
local variable declarations that became unused in the connection
functions after the consolidations.

Verified bit-identical on the tribox/line global and tri/line local
example runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Refactor EAM pair style family to eliminate duplicated code
@pull pull Bot locked and limited conversation to collaborators Jul 3, 2026
@pull pull Bot added the ⤵️ pull label Jul 3, 2026
@pull pull Bot merged commit da16090 into comphy-lab:develop Jul 3, 2026
6 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant