scripts: harden rmg_kinetics & rmg_thermo#889
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens ARC’s standalone RMG helper scripts (thermo + kinetics) based on prior review feedback, improving correctness across repeated database queries, safer handling of unsupported kinetics forms, and more robust CLI invocation/testing.
Changes:
- Added
--output/-osupport (and preserved default overwrite behavior) so scripts can write results to a separate YAML file. - Hardened
rmg_kinetics.pyby deep-copying database kinetics before in-place mutation and guardingchange_rate()to Arrhenius-like kinetics. - Improved script portability by bootstrapping
sys.pathsofrom common import ...works regardless of current working directory; added/expanded unit tests.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
arc/scripts/rmg_thermo.py |
Adds sys.path bootstrap and supports writing results to an optional output YAML path. |
arc/scripts/rmg_kinetics.py |
Deep-copies kinetics before mutation, guards degeneracy scaling, improves debug printing, adds --output, and updates documentation. |
arc/scripts/common.py |
Extends the shared argparse parser with an optional --output/-o flag. |
arc/scripts_test.py |
Adds unit tests for argparse, helper behavior, and --output non-overwrite behavior via conda run. |
Comments suppressed due to low confidence (1)
arc/scripts/rmg_kinetics.py:117
- A_units is only set for bimolecular vs 'everything else'. For termolecular reactions (len(reactants)==3), this will incorrectly treat A as s^-1 and skip the needed m^6→cm^6 scaling, even though loop_families supports 3 reactants/products. Consider adding an explicit 3-reactant branch (and corresponding scaling) or rejecting/flagging unsupported reaction orders.
# Families:
A_units = "cm^3/(mol*s)" if len(reaction.reactants) == 2 else "s^-1"
fam_list = loop_families(rmgdb, reaction)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - ``A``: cm^3/(mol*s) for bimolecular reactions, s^-1 for unimolecular | ||
| (3-body: cm^6/(mol^2*s)). Reported in the units stored on the | ||
| Arrhenius object after the SI->cm conversion below. |
There was a problem hiding this comment.
Good catch — fixed in 342f518. Replaced the binary unimolecular/bimolecular flag with get_a_factor_si_to_cm(num_reactants) ({1: 1.0, 2: 1e6, 3: 1e12}), so termolecular reactions now get the correct m^6→cm^6 (×1e12, cm^6/(mol^2*s)) scaling instead of being mislabeled s^-1. The code now matches the documented 3-body units.
| """The new isinstance gate must skip ``change_rate`` for non-Arrhenius kinetics | ||
| (e.g. Chebyshev) rather than blindly mutating them.""" | ||
| snippet = textwrap.dedent(f""" | ||
| import sys | ||
| sys.path.insert(0, {self.SCRIPT_DIR!r}) | ||
| from rmgpy.kinetics import Arrhenius, ArrheniusEP | ||
| # Sanity: the script imports the same classes we test against. | ||
| import rmg_kinetics as rk | ||
| assert rk.Arrhenius is Arrhenius | ||
| assert rk.ArrheniusEP is ArrheniusEP | ||
| # The guard logic itself is a one-line isinstance check; replicate it here | ||
| # so a regression that drops the guard would fail the test. | ||
| from rmgpy.kinetics import Chebyshev | ||
| cheb = Chebyshev(coeffs=[[1.0, 0.0], [0.0, 0.0]], | ||
| kunits='cm^3/(mol*s)', | ||
| Tmin=(300.0, 'K'), Tmax=(2000.0, 'K'), | ||
| Pmin=(0.01, 'bar'), Pmax=(100.0, 'bar')) | ||
| assert not isinstance(cheb, (rk.Arrhenius, rk.ArrheniusEP)) | ||
| arr = Arrhenius(A=(1.0, 's^-1'), n=0.0, Ea=(0.0, 'J/mol')) | ||
| assert isinstance(arr, (rk.Arrhenius, rk.ArrheniusEP)) |
There was a problem hiding this comment.
Agreed, the old test was self-referential. Fixed in 342f518: the guard is now factored into a real, called helper scale_kinetics_by_degeneracy() (invoked by determine_rmg_kinetics), and the test exercises that function directly — Arrhenius A must double while Chebyshev coeffs must stay put. Since change_rate provably shifts Chebyshev coeffs (-5.0 → -4.699), dropping the guard now corrupts them and fails the test.
| ) | ||
| self.assertEqual(result.returncode, 0, f'thermo script failed: {result.stderr}') | ||
|
|
||
| # Input must be byte-identical (no overwrite). |
There was a problem hiding this comment.
Fixed in 342f518 — the test now captures the raw input bytes before running the script and asserts the bytes are unchanged afterward, so the "byte-identical" claim is genuinely verified (rather than a parsed-YAML comparison).
0188187 to
4c80cb6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #889 +/- ##
==========================================
- Coverage 63.02% 63.00% -0.02%
==========================================
Files 113 113
Lines 37958 37958
Branches 9956 9956
==========================================
- Hits 23922 23915 -7
- Misses 11155 11162 +7
Partials 2881 2881
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- deep-copy RMG-db kinetics before scaling so a second query can't read a doubly-scaled A - guard degeneracy scaling to Arrhenius/EP so non-Arrhenius forms (Chebyshev/PLOG/etc.) are skipped safely - add sys.path bootstrap so `from common import ...` works regardless of CWD - add an --output flag so the input YAML isn't overwritten - document the training-entry filter and output units - route the helper's debug print to stderr so it can't pollute parsed stdout, and harden it against reactions without reactants. - extend the SI->cm A-factor conversion to termolecular reactions (1e12) instead of mislabeling them s^-1 - factor the degeneracy guard into scale_kinetics_by_degeneracy() and test it against the real code path (a dropped guard now corrupts Chebyshev coeffs and fails the test) - tighten test docs/assertions (JSON not YAML; true byte-level non-overwrite check). Adds argparse + subprocess unit tests.
Deep-copy DB kinetics before scaling so a second query can't read a doubly-scaled A, gate change_rate on Arrhenius/EP so non-Arrhenius forms (Chebyshev/PLOG/etc.) are skipped safely, add sys.path bootstrap so
from common import ...works regardless of CWD, add an --output flag so the input YAML isn't overwritten, document the training-entry filter and output units, and harden the helper's debug print against reactions without reactants. Adds argparse + subprocess unit tests.