[pull] develop from lammps:develop#165
Merged
Merged
Conversation
Move he_flag from PairEAMFS to PairEAM and dispatch on it in a shared embedding_index() inline helper used by compute(), single(), and compute_atomic_energy(). This makes PairEAMHE::compute() redundant (it was a verbatim copy of PairEAM::compute() except for ~15 lines in the embedding loop) and is a prerequisite for accelerated eam/he variants. Results for eam, eam/alloy, eam/fs, eam/cd, and eam/he are unchanged: verified bit-identical thermo output for serial and 4-rank MPI runs with newton on/off, including bench/in.eam and examples/shear. Exception (bug fix): PairEAM::single() and compute_atomic_energy() previously evaluated the embedding term for eam/he without the rhomin table offset and thus returned wrong values (e.g. through compute group/group); they now use the correct eam/he embedding evaluation including its two-sided linear extrapolation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Give PairEAM a fileformat selector (FUNCFL/SETFL/FS) and dispatch coeff(), read_file(), and file2array() on it; the setfl and fs reader and conversion implementations move verbatim from PairEAMAlloy and PairEAMFS into the base class (read_setfl/read_fs, file2array_setfl/ file2array_fs), with the two formerly line-identical coeff() bodies merged into one coeff_mapped(). The repeated setfl/fs cleanup code is factored into delete_setfl()/delete_fs(). PairEAMAlloy, PairEAMFS, and PairEAMHE become constructor-only classes that select the file format. This removes the need for virtual inheritance: eam/alloy/opt and eam/fs/opt now derive from PairEAMOpt alone instead of forming a diamond, and eam/cd no longer initializes a virtual base. Class names, style names, and the virtual read_file()/ file2array() hooks are unchanged. This is a pure restructuring: thermo output verified byte-identical for eam, eam/alloy, eam/fs, eam/he, eam/cd plus eam/opt, eam/alloy/opt, eam/fs/opt (serial and 4-rank MPI, newton on/off, bench/in.eam and examples/shear included). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
With the setfl/fs readers in PairEAM behind the fileformat selector, the verbatim copies of coeff()/read_file()/file2array() in eam/alloy/omp, eam/fs/omp, eam/alloy/intel, and eam/fs/intel are redundant: these classes are now constructor-only and inherit the readers from PairEAM through their accelerated base class. The virtual inheritance in these headers is dropped as well. Thermo output verified byte-identical for all EAM gate runs with -sf omp (2 threads) and -sf intel (mode double) plus plain and -sf opt, serial and 4-rank MPI, newton on/off. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
PairEAMGPU now holds function-pointer bindings to the per-style device instance functions from the gpu library (init/clear/compute_n/compute/ compute_force/bytes), set in the constructor. eam/alloy/gpu and eam/fs/gpu become constructor-only subclasses of PairEAMGPU that select their file format (readers inherited from PairEAM) and re-point the bindings to their own device instance; their verbatim copies of the whole GPU wrapper and of the file readers are removed. Thermo output verified byte-identical for all EAM gate runs with -sf gpu (OpenCL backend) plus plain CPU runs, serial and 4-rank MPI, newton on/off; full test suite (incl. GPU suffix pair tests) passes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
eam/alloy/kk and eam/fs/kk were each a near-verbatim ~1500-line copy of pair_eam_kokkos.cpp (full functor set, comm packs, and pasted file readers). With the readers in PairEAM behind the fileformat selector, both collapse to constructor-only template subclasses of PairEAMKokkos<DeviceType> that select their file format. The copies had also drifted: the base eam/kk received the KK_FLOAT/KK_ACC_FLOAT mixed-precision modernization (rdr_kk/rhomax_kk members, explicit casts) that was never applied to the alloy/fs copies. They now inherit the maintained implementation; for default double-precision KOKKOS builds the results are unchanged. Thermo output verified byte-identical for all EAM gate runs with -k on t 1 -sf kk (serial backend) plus plain CPU runs, serial and 4-rank MPI, newton on/off. Note: the kk suffix sub-tests of test_pair_style segfault in a host-only KOKKOS serial build both before and after this change (also for untouched styles like born/kk and sw/kk); this pre-existing issue is unrelated to this commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The unit conversion loops in read_setfl() skipped the last tabulated point of the frho and z2r tables (j < nrho / k < nr instead of <=), inconsistent with the funcfl and fs readers which convert the full table. The effect is limited to the table end points (z2r at the cutoff, frho at the maximum density) and stays within the tolerances of the eam_alloy_real and eam_cd_real unit tests, which pass unchanged.
Accumulate the density and force contributions of the outer-loop atom i in local variables instead of in-place updates, as the /opt and /omp variants already do. The accumulators are seeded with the current array values, which keeps the summation order and thus the results bit-for-bit identical (also for hybrid sub-styles). The embedding evaluation loop moves into a compute_embedding() method templated on the table format, and embedding_index() gains a templated variant, so the eam/he format test is hoisted out of the loop and resolved at compile time (the non-template overload dispatching on he_flag remains for single() and compute_atomic_energy()). Thermo output stays byte-identical for eam, eam/alloy, eam/fs, eam/he, and eam/cd (serial and 4-rank MPI, newton on/off); interleaved best-of-3 serial benchmarks (32k atoms, 100 steps) show 2-3% speedup for all four file formats. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
fp[i] is constant over the inner neighbor loop but was re-loaded every iteration since the compiler cannot prove the f[j] stores do not alias the fp array; the numforce[i] per-neighbor increment becomes a local counter stored once per atom i. Results are bit-for-bit identical. Benchmarks (32k atoms, interleaved best-of-3, GCC/x86) show no measurable timing change, i.e. the compiler already coped; the change removes the dependence on alias analysis and matches the accumulator pattern of the surrounding code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A non-KOKKOS style running inside a KOKKOS run expects x2lamda()/lamda2x() to update the coordinates it reads on the host, but the DomainKokkos overrides transform only the Kokkos views. Whenever those views do not alias the legacy atom->x array (host builds at single/mixed precision, and every GPU build), a host kspace style with a triclinic box, e.g. pppm/cg which has no /kk variant, reads stale Cartesian coordinates against the lamda-space grid mapping after its in-compute x2lamda() call and aborts with "Out of range atoms - cannot compute PPPM". Setup escapes because VerletKokkos::setup() runs under auto_sync; the timestep loop does not. This is the KSpaceStyle:pppm_cg_tri_slab failure of the KOKKOS serial single/mixed precision CI jobs. Add TransformView::need_sync_legacy() (is a Kokkos view newer than the legacy host view?) and use it in the four DomainKokkos::x2lamda()/ lamda2x() array overloads: when the legacy host copy is current on entry, sync it again on exit so the caller-visible host state stays coherent. Device-resident flows (verlet_kokkos, fix_nh_kokkos, min_kokkos, ...) enter with the host copy already stale and skip the extra sync, so the per-step device path is unaffected. Double precision host builds alias legacy and Kokkos views and were never affected; the added sync folds to a no-op there. The kokkos_omp (present since the test was added) and kokkos_gpu (added 2026-06-28) entries in the pppm_cg_tri_slab skip list were this same staleness in its other guises; remove both so the test locks the fix in on all KOKKOS backends. Verified: full unittest suite green on clean serial single/mixed/double builds, KSpace+FixTimestep green on OpenMP single (kokkos_omp unskipped) and HIP gfx1103 double (kokkos_gpu unskipped). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
FixPrecessionSpin::init() converted the Zeeman field intensity with H_field *= gyro in place. Fix::init() runs once per run command, so every additional run compounded the conversion and scaled the field by another factor of gyro ~= 0.176 rad.THz/T: any multi-run input using precession/spin zeeman sampled a different field in the second and later runs, and a run continued from a restart file diverged from the same run performed without interruption. Store the converted intensity in a new member (hfield) instead, like the anisotropy constants already do (Kah = Ka/hbar), so init() is idempotent and H_field keeps the user-supplied value in Tesla. Found by the new fix-timestep-precession_spin.yaml test: its restart leg could never match the continuous reference trajectory. Verified by comparing an uninterrupted 8-step run against a 4-step run plus restart continuation, which are now bit-identical. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
PairSpinNeel::init_one() mirrored all eight coefficient arrays to the transposed type pair but not cut_spin_neel[j][i], unlike all other spin pair styles which mirror their cutoff array. With more than one atom type, compute() and compute_single_pair() then compared rsq against an uninitialized cutoff for half of the type pairs, producing nondeterministic forces and energies. Found by the new spin-pair-neel.yaml test failing intermittently (9 out of 30 identical serial runs); valgrind --track-origins pointed at the rsq <= local_cut2 tests. 30/30 stable and valgrind-clean after the fix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
…s.cpp Doxygen could not match the explicit instantiations of utils::bounds() and utils::bounds_typelabel() nor the explicit specializations of utils::join<T>() in utils.cpp to their template declarations in utils.h, emitting 13 "no matching file member found" warnings. The two anon-namespace forward declarations of re_match()/re_find() used /** */ doxygen comments without \param/\return, emitting 4 more "parameters not documented" warnings. Wrap the instantiation/specialization blocks in \cond ... \endcond so doxygen skips them (their documentation lives on the utils.h template declarations, which breathe resolves), and demote the two forward-declaration comments to plain /* */. doc/doxygen-warn.log goes from 17 warnings to 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Replace the targeted fix from commit 237dab0 (keep legacy host coords current in DomainKokkos::x2lamda/lamda2x) with the general mechanism suggested by Stan Moore in review: when a style without KOKKOS support runs inside a KOKKOS run, enable kokkos->auto_sync for the duration of its compute so that every sync()/modified() it triggers - for example through the virtual DomainKokkos x2lamda/lamda2x overrides - writes modifications through to the legacy host arrays the style reads and writes. This is the same pattern ModifyKokkos already applies to every fix hook invocation and it covers any host-visible per-atom data, not just coordinates. Apply the wrapping to the pair, bond, angle, dihedral, improper, and kspace compute invocations in VerletKokkos::run(), MinKokkos::energy_force(), DynamicalMatrixKokkos::update_force(), and ThirdOrderKokkos::update_force(). The setup paths already run with auto_sync enabled and need no change. The KSpace base class was missing the kokkosable member that all other style base classes have; add it and set it in PPPMKokkos. Remove the now superseded TransformView::need_sync_legacy() and the conditional legacy sync from the DomainKokkos coordinate transforms. Verified: KSpaceStyle:pppm_cg_tri_slab (which runs the non-KOKKOS pppm/cg in a triclinic KOKKOS run and aborted before the original fix) passes on KOKKOS serial single and mixed precision builds and on a HIP GPU build; full unittest suite green on the serial single precision build; KSpace and FixTimestep suites green on serial mixed and double precision and HIP builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
…mand pair_compute_neighlist() enables the neigh_thread setting on GPUs for systems with up to 16000 atoms when the pair style runs with newton off, without marking it as user selected. That state persists in the KokkosLMP class instance across a clear command, so when clear re-applies the package defaults and command line settings, a package command with "newton on" that was accepted at startup now aborts with "Must use 'newton off' with KOKKOS package option 'neigh/thread on'" even though the user never enabled neigh/thread. To reproduce: run a small system with newton off on a GPU, issue a clear command, and set up a newton on run in the same session. Reset the setting at the top of the package command processing unless it was selected with the neigh/thread option; the heuristic re-evaluates it on the next pair style compute anyway. Found by activating the newton off legs of the kokkos_gpu force-style tests: all 76 yaml files with KOKKOS-capable pair styles aborted in a later test stage after the newton off leg had triggered the heuristic. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
Add support for magnetic (atom_style spin) systems to the pair style and fix timestep test drivers, following the model of the ellipsoid special case: - new yaml blocks init_mag_forces/run_mag_forces (per-atom fm) and run_spin (per-atom sp incl. norm), captured automatically when the atom style provides them and compared wherever forces (pair tester) and positions/velocities/torques (fix tester) are compared - new 'timestep' yaml keyword so the fix tester can use a timestep suitable for spin dynamics in metal units instead of the previous hard-coded 0.25 (which remains the default) - a 'spin' tag: such yaml files must define fix nve/spin in their post_commands (the spin pair styles compute mechanical forces only when the fix is present), so the pair tester run phase does not add another time integrator; newton off legs are skipped via the established newton_pair pre_commands pin since spin pair styles require newton on - remove the hard-coded MOLECULE package gate in the fix tester; the atom style prerequisites in every yaml file already cover it - new inputs in.spin, data.spin (54-atom perturbed bcc iron cell with randomized spins, two atom types), coeffs.spin - pair style references for spin/exchange, spin/exchange/biquadratic, spin/dmi, spin/neel, spin/magelec, spin/dipole/cut, and hybrid/overlay eam/fs + spin/exchange; fix references for nve/spin, precession/spin, and setforce/spin, registered as SpinPairStyle:* and FixTimestep:* ctest cases These tests found the fix precession/spin zeeman compounding bug and the pair spin/neel uninitialized cutoff bug fixed in the two previous commits. No test for fix langevin/spin yet: it does not store the RNG state in restart files, so its restart trajectory cannot match. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
The newton off blocks in the omp leg and the shared kokkos test body of test_pair_style checked "if (newton_pair == 0)" on the still newton on instance and only re-initialized with newton off inside that block, so the comparisons had been dead code since 2021 (the plain leg re-initializes first and then checks whether the pair style forced newton back on). Restructure both legs to match the plain leg. Also correct the comparison of the freshly initialized state inside the previously dead omp block, which compared the initial forces against the run_forces reference, and remove a duplicated magnetic force check that had slipped into the plain and kokkos legs. With this change the newton off state of every pair style with an /omp or /kk variant is compared against the reference data. Results across all 394 pair and kspace yaml files: no failures in the omp and kokkos_omp legs (OpenMP build), no failures in the kokkos_serial leg (KOKKOS serial single precision build; the pre-existing pppm_tip4p_ad failure is unrelated), and no failures in the kokkos_gpu leg (HIP build) after the preceding fix for the stale neigh/thread state. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RZmFEmpx4SmV2vaUWBF4Yk
The four std::sort comparators for the per-particle contact lists in FixSurfaceGlobal::post_force() and PairSurfGranular::compute() took their ContactSurf arguments by value. The struct holds three std::vector members besides ~200 bytes of plain data, so every comparison in these per-atom hot loops copied both arguments including up to six heap allocations. Take them by const reference. Found by Coverity Scan (CIDs 1661276, 1661387, 1661110, 1661564, 1661250, 1661984, 1661398, 1661585). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
The loop over the connections of a line's second endpoint iterated with the connection count of the first endpoint, reading past the end of neigh_p2[] whenever a line has more connections at endpoint 1 than at endpoint 2 and crashing FixSurfaceGlobal::init() for 2d global surfaces with per-molecule motion (e.g. the in.line.gran.global example). The 3d twin block pairs its counts correctly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
The four sort call sites for per-particle contact lists in FixSurfaceGlobal::post_force() and PairSurfGranular::compute() carried duplicated lambda comparators encoding the same 6-level ordering (overlap, priority, normal alignment, center-of-mass distance, index), in two variants: before the connection pre-walk the normal signs are not yet assigned and the alignment is compared in absolute value. Move them into two static comparators on FixSurface implemented via one shared function, so the pair style and the fix are guaranteed to rank multi-contact configurations identically, and return bool instead of int. Verified bit-identical on the tribox/line global and tri/line local example runs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Setting the fflag/nside/aflag (and ewhich/pwhich) connection flags between two connected lines or triangles was implemented in ten nearly identical blocks: one per endpoint (2d) or edge (3d) of the global and local surface fixes, differing only in which endpoints form the edge and which per-connection arrays are written. Move the shared computation into two FixSurface methods, point_connection2d() and edge_connection3d(); the callers keep their genuinely different endpoint matching (point indices for global surfaces, coordinate comparison for distributed local surfaces) and now report inconsistent surface connectivity instead of leaving flags unset when no shared point is found. The classification values (FLAT/NONFLAT, CONCAVE/CONVEX, SAME_SIDE/OPPOSITE_SIDE) move from per-file enum copies into the FixSurface class so the helpers and both subclasses share one definition. Verified bit-identical on the tribox/line global and tri/line local example runs (only source line numbers in warning messages shift). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Extend the connection-flag consolidation to the third duplicated family: the six corner blocks (three per file) that set the cwhich/nside/fflag values for tris sharing only a corner point. The shared computation (normal sides from the sign of the normal dot product plus the flatness test) moves into FixSurface::corner_connection3d(); the callers keep their differing corner matching and now report inconsistent surface connectivity for an unmatched corner instead of leaving cwhich unset. Also drop the local variable declarations that became unused in the connection functions after the consolidations. Verified bit-identical on the tribox/line global and tri/line local example runs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YNvuaNpJNkbCKaAE8ngPQv
Collected small changes and fixes
Refactor EAM pair style family to eliminate duplicated code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )