diff --git a/docs/advanced_installation/advanced_installation.rst b/docs/advanced_installation/advanced_installation.rst index fc2e8546a..bffad0768 100644 --- a/docs/advanced_installation/advanced_installation.rst +++ b/docs/advanced_installation/advanced_installation.rst @@ -1,7 +1,7 @@ Advanced Installation ===================== -`pip `__ \|\| `uv `__ \|\| `pixi `__ \|\| `conda `__ \|\| `Spack `__ +`pip `__ || `uv `__ || `pixi `__ || `conda `__ || `Spack `__ libEnsemble can be installed from ``pip``, ``uv``, ``pixi``, ``Conda``, or ``Spack``. @@ -31,7 +31,18 @@ Further recommendations for selected HPC systems are given in the Globus Compute -------------- -`Globus Compute`_ may be installed optionally to submit simulation function instances to remote Globus Compute endpoints. +`Globus Compute`_ may be installed optionally to submit simulation function +instances to remote Globus Compute endpoints:: + + pip install globus-compute-sdk + +This is an optional dependency; libEnsemble operates normally without it. +If Globus Compute is not installed and a ``globus_compute_endpoint`` is +configured, libEnsemble will warn and fall back to local execution. + +See :ref:`Globus Compute - Remote User Functions` for +usage, and the :doc:`GlobusComputeExecutor API reference` +for the full executor interface. .. _Globus Compute: https://www.globus.org/compute .. _Python: http://www.python.org diff --git a/docs/conf.py b/docs/conf.py index 8647bbf4f..2d227780e 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -31,7 +31,7 @@ def __getattr__(cls, name): return MagicMock() -autodoc_mock_imports = ["ax", "gpcam", "IPython", "matplotlib", "pandas", "scipy", "surmise"] +autodoc_mock_imports = ["ax", "globus_compute_sdk", "gpcam", "IPython", "matplotlib", "pandas", "scipy", "surmise"] MOCK_MODULES = [ "argparse", @@ -135,7 +135,7 @@ class AxParameterWarning(Warning): # Ensure it's a real warning subclass # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # -# source_suffix = ['.rst', '.md'] +# source_suffix = ['.md', '.rst'] source_suffix = ".rst" # The master toctree document. @@ -205,7 +205,7 @@ class AxParameterWarning(Warning): # Ensure it's a real warning subclass html_favicon = "./images/libE_logo_circle.png" html_title = "libEnsemble" -# Theme options are theme-specific and customize the look and feel of a theme +# Theme options are theme-specific and customize the look and feel of the theme # further. For a list of options available for each theme, see the # documentation. # diff --git a/docs/executor/ex_globus_compute.rst b/docs/executor/ex_globus_compute.rst new file mode 100644 index 000000000..56047cf26 --- /dev/null +++ b/docs/executor/ex_globus_compute.rst @@ -0,0 +1,57 @@ +Globus Compute Executor +======================= + +`Overview `__ || `Base Executor `__ || `MPI Executor `__ || **Globus Compute Executor** + +The :class:`GlobusComputeExecutor` +submits Python callables to a remote `Globus Compute`_ endpoint instead of +launching local subprocesses. It can be used inside simulator functions in the +same way as the :doc:`MPI Executor`, retrieving it from +``libE_info["executor"]``. + +See :ref:`Globus Compute - Remote User Functions` for an +overview of the two GC integration modes (manager-side GC-only and user-facing +executor). + +.. note:: + + ``globus-compute-sdk`` must be installed to use this executor:: + + pip install globus-compute-sdk + + Users must also authenticate via Globus_ and have an active + `Globus Compute endpoint`_ running on the target system. + +GlobusComputeExecutor +--------------------- + +.. autoclass:: libensemble.executors.globus_compute_executor.GlobusComputeExecutor + :members: register_app, submit, set_workerID, set_worker_info + :show-inheritance: + + .. automethod:: __init__ + +GlobusComputeTask +----------------- + +Tasks are created and returned by +:meth:`GlobusComputeExecutor.submit()`. +Each task wraps a ``concurrent.futures.Future`` from the Globus Compute SDK +and exposes the same polling interface as other libEnsemble tasks. + +.. autoclass:: libensemble.executors.globus_compute_executor.GlobusComputeTask + :members: poll, wait, kill, result, running, done, cancelled + +**Task states**: ``RUNNING`` | ``FINISHED`` | ``FAILED`` | ``USER_KILLED`` + +**Key attributes**: + +:task.state: (string) Current task state - one of the values above. +:task.finished: (bool) True once the task has completed (successfully or not). +:task.success: (bool) True if the remote callable returned without raising. +:task.runtime: (float) Elapsed wall-clock seconds since submission. +:task.submit_time: (float) Time since epoch at submission. + +.. _Globus Compute: https://www.globus.org/compute +.. _Globus: https://www.globus.org/ +.. _Globus Compute endpoint: https://globus-compute.readthedocs.io/en/latest/endpoints.html diff --git a/docs/executor/ex_index.rst b/docs/executor/ex_index.rst index a4f33cb39..f3c141927 100644 --- a/docs/executor/ex_index.rst +++ b/docs/executor/ex_index.rst @@ -1,6 +1,6 @@ .. _executor_index: -**Overview** \|\| `Base Executor `__ \|\| `MPI Executor `__ +**Overview** || `Base Executor `__ || `MPI Executor `__ || `Globus Compute Executor `__ Executors ========= @@ -14,8 +14,12 @@ portable interface for running and managing user applications. ex_overview ex_base ex_mpi + ex_globus_compute The **Executor** provides a portable interface for running applications on any system and -any number of compute resources. +any number of compute resources. The :doc:`MPI Executor` launches MPI +applications on local resources; the +:doc:`Globus Compute Executor` submits Python callables to +remote Globus Compute endpoints. Please select from the sections above or the sidebar navigation to read more. diff --git a/docs/executor/ex_overview.rst b/docs/executor/ex_overview.rst index f53510b73..a1af81ef6 100644 --- a/docs/executor/ex_overview.rst +++ b/docs/executor/ex_overview.rst @@ -1,7 +1,7 @@ Overview ======== -**Overview** \|\| `Base Executor `__ \|\| `MPI Executor `__ +**Overview** || `Base Executor `__ || `MPI Executor `__ The **Executor** provides a portable interface for running applications on any system and any number of compute resources. @@ -156,4 +156,19 @@ which partitions resources among workers, ensuring that runs utilize different resources (e.g., nodes). Furthermore, the ``MPIExecutor`` offers resilience via the feature of re-launching tasks that fail to start because of system factors. +Remote Execution with Globus Compute +------------------------------------- + +The :doc:`GlobusComputeExecutor` submits Python callables +to remote `Globus Compute`_ endpoints instead of launching local subprocesses. +It exposes the same ``submit()`` / ``poll()`` / ``kill()`` interface as other +libEnsemble executors and can be retrieved from ``libE_info["executor"]`` +inside simulator functions. + +See :ref:`Globus Compute - Remote User Functions` for an +overview of all Globus Compute integration modes and the +:doc:`GlobusComputeExecutor API reference` for the full +interface. + .. _concurrent futures: https://docs.python.org/library/concurrent.futures.html +.. _Globus Compute: https://www.globus.org/compute diff --git a/docs/platforms/globus_compute.rst b/docs/platforms/globus_compute.rst new file mode 100644 index 000000000..0679f9559 --- /dev/null +++ b/docs/platforms/globus_compute.rst @@ -0,0 +1,145 @@ +.. _globus_compute_ref: + +====================================== +Globus Compute - Remote User Functions +====================================== + +`Globus Compute`_ (formerly funcX) is a distributed, high-performance +function-as-a-service platform. When libEnsemble is running on a resource with +internet access (laptops, login nodes, other servers, etc.), it can offload +simulator calls to remote Globus Compute endpoints: + + .. image:: ../images/funcxmodel.png + :alt: running_with_globus_compute + :scale: 50 + :align: center + +This is useful for running ensembles across machines and heterogeneous resources. +There are **two approaches**, described below. + +.. dropdown:: **Caveats** + + The following caveats apply to all Globus Compute modes: + + 1. Simulator functions submitted to Globus Compute must be non-persistent, + since manager-worker communicators cannot be serialized or used by a + remote resource. + + 2. ``Executor.manager_poll()`` is not available inside remotely executed + functions. Control over remote work is limited to inspecting return + values and exceptions when tasks complete. + + 3. Globus Compute imposes a `handful of task-rate and data limits`_ on + submitted functions. + + 4. Users are responsible for authenticating via Globus_ and maintaining their + `Globus Compute endpoints`_ on their target systems. + +.. _gc_only_mode: + +Manager-side GC (GC-only mode) +------------------------------- + +The recommended approach for most use cases. When +``globus_compute_endpoint`` is set in :class:`SimSpecs` +and ``gen_on_worker`` is not set (the default), libEnsemble enters +**GC-only mode**: no local worker processes are launched. The manager +submits simulation work directly to Globus Compute and polls futures for +results. The generator still runs as a local thread on the manager. + +``nworkers`` controls the maximum number of simultaneously in-flight +Globus Compute tasks (virtual concurrency). The default is 1. + +This mode supports both the :ref:`gest-api simulator format` +(``SimSpecs.simulator``) and the legacy ``sim_f`` format. + +.. code-block:: python + + from libensemble import Ensemble + from libensemble.specs import ExitCriteria, GenSpecs, LibeSpecs, SimSpecs + + + def my_sim(input_dict: dict, **kwargs) -> dict: + """gest-api simulator - runs remotely on the GC endpoint.""" + return {"f": input_dict["x"] ** 2} + + + sim_specs = SimSpecs( + simulator=my_sim, + vocs=vocs, + globus_compute_endpoint="3af6dc24-3f27-4c49-8d11-e301ade15353", + ) + + libE_specs = LibeSpecs(nworkers=4) # up to 4 concurrent GC tasks + + workflow = Ensemble( + sim_specs=sim_specs, + gen_specs=gen_specs, + libE_specs=libE_specs, + exit_criteria=ExitCriteria(sim_max=20), + ) + H, _, _ = workflow.run() + +Users can also define ``Executor`` instances within their remote simulator +functions and submit MPI applications normally, as long as libEnsemble and +the target application are accessible on the remote system:: + + # Within the remote simulator function + from libensemble.executors import MPIExecutor + exctr = MPIExecutor() + exctr.register_app(full_path="/home/user/forces.x", app_name="forces") + task = exctr.submit(app_name="forces", num_procs=64) + +.. note:: + + Both the simulator callable and any VOCS object must be picklable, + as they are serialized and shipped to the remote Globus Compute endpoint. + +.. _gc_executor_approach: + +GlobusComputeExecutor (user-facing) +------------------------------------ + +For workflows where the simulation function itself orchestrates remote +calls, like fanning out to multiple endpoints or mixing local +and remote work. Use the +:class:`GlobusComputeExecutor` +directly inside the simulator. + +Create and register the executor in the top-level script: + +.. code-block:: python + + from libensemble.executors import GlobusComputeExecutor + + exctr = GlobusComputeExecutor(endpoint_id="3af6dc24-3f27-4c49-8d11-e301ade15353") + +Then use it inside the simulator function: + +.. code-block:: python + + import time + + + def my_sim(H, persis_info, sim_specs, libE_info): + exctr = libE_info["executor"] + + task = exctr.submit(func=my_remote_func, app_args=H["x"][0]) + + while not task.finished: + task.poll() + if exctr.manager_kill_received(): + task.kill() + break + time.sleep(0.1) + + return H_o, persis_info + +See the :doc:`GlobusComputeExecutor API reference<../executor/ex_globus_compute>` for +the full interface including ``register_app``, ``submit``, and +:class:`GlobusComputeTask` methods. + +.. _Globus Compute: https://www.globus.org/compute +.. _Globus Compute endpoints: https://globus-compute.readthedocs.io/en/latest/endpoints.html +.. _Globus: https://www.globus.org/ +.. _handful of task-rate and data limits: https://globus-compute.readthedocs.io/en/latest/limits.html diff --git a/docs/platforms/platforms_index.rst b/docs/platforms/platforms_index.rst index d1de30c45..3a16a5eed 100644 --- a/docs/platforms/platforms_index.rst +++ b/docs/platforms/platforms_index.rst @@ -159,60 +159,9 @@ will better manage simulation and generation functions that contain considerable computational work or I/O. Therefore the second option is to use Globus Compute to isolate this work from the workers. -.. _globus_compute_ref: - -Globus Compute - Remote User Functions --------------------------------------- - -If libEnsemble is running on some resource with -internet access (laptops, login nodes, other servers, etc.), workers can be instructed to -launch generator or simulator user function instances to separate resources from -themselves via `Globus Compute`_ (formerly funcX), a distributed, high-performance function-as-a-service platform: - - .. image:: ../images/funcxmodel.png - :alt: running_with_globus_compute - :scale: 50 - :align: center - -This is useful for running ensembles across machines and heterogeneous resources, but -comes with several caveats: - - 1. User functions registered with Globus Compute must be *non-persistent*, since - manager-worker communicators can't be serialized or used by a remote resource. - - 2. Likewise, the ``Executor.manager_poll()`` capability is disabled. The only - available control over remote functions by workers is processing return values - or exceptions when they complete. - - 3. Globus Compute imposes a `handful of task-rate and data limits`_ on submitted functions. - - 4. Users are responsible for authenticating via Globus_ and maintaining their - `Globus Compute endpoints`_ on their target systems. - -Users can still define Executor instances within their user functions and submit -MPI applications normally, as long as libEnsemble and the target application are -accessible on the remote system:: - - # Within remote user function - from libensemble.executors import MPIExecutor - exctr = MPIExecutor() - exctr.register_app(full_path="/home/user/forces.x", app_name="forces") - task = exctr.submit(app_name="forces", num_procs=64) - -Specify a Globus Compute endpoint in :class:`sim_specs` via the ``globus_compute_endpoint`` -argument. For example:: - - from libensemble.specs import SimSpecs - - sim_specs = SimSpecs( - sim_f = sim_f, - inputs = ["x"], - out = [("f", float)], - globus_compute_endpoint = "3af6dc24-3f27-4c49-8d11-e301ade15353", - ) - -See the ``libensemble/tests/scaling_tests/globus_compute_forces`` directory for a complete -remote-simulation example. +See :doc:`Globus Compute - Remote User Functions` for the two +integration approaches (manager-side GC-only mode and the user-facing +``GlobusComputeExecutor``). Instructions for Specific Platforms ----------------------------------- @@ -231,9 +180,5 @@ libEnsemble on specific HPC systems. perlmutter polaris srun + globus_compute example_scripts - -.. _Globus Compute: https://www.globus.org/compute -.. _Globus Compute endpoints: https://globus-compute.readthedocs.io/en/latest/endpoints.html -.. _Globus: https://www.globus.org/ -.. _handful of task-rate and data limits: https://globus-compute.readthedocs.io/en/latest/limits.html diff --git a/docs/running_libE.rst b/docs/running_libE.rst index 6329e13e2..2677eb8d2 100644 --- a/docs/running_libE.rst +++ b/docs/running_libE.rst @@ -83,9 +83,9 @@ if using an :class:`Ensemble` object with **Reverse-ssh interface** Set ``comms`` to ``ssh`` to launch workers on remote ssh-accessible systems. This -co-locates workers, functions, and any applications. User -functions can also be persistent, unlike when launching remote functions via -:ref:`Globus Compute`. +co-locates workers, functions, and any applications. Simulator functions can be +persistent, unlike those submitted to :ref:`Globus Compute`, +which must be non-persistent. The remote working directory and Python need to be specified. This may resemble:: diff --git a/libensemble/tests/regression_tests/test_persistent_aposmm_nlopt.py b/libensemble/tests/regression_tests/test_persistent_aposmm_nlopt.py index 28da42d53..1c3e4cca4 100644 --- a/libensemble/tests/regression_tests/test_persistent_aposmm_nlopt.py +++ b/libensemble/tests/regression_tests/test_persistent_aposmm_nlopt.py @@ -73,7 +73,7 @@ "xtol_abs": 1e-6, "ftol_abs": 1e-6, "dist_to_bound_multiple": 0.5, - "max_active_runs": 6, + "max_active_runs": nworkers - 1, "lb": np.array([-3, -2]), "ub": np.array([3, 2]), },