Feature/sklearnex and remove faiss by NetZissou · Pull Request #33 · Imageomics/emb-explorer

NetZissou · 2026-06-02T16:01:16Z

Closes issue #32

Patch sklearn with sklearnex.
Remove faiss entirely from backend.

About sklearnex

Unlike other libraries in the Python ecosystem, classes and functions in the Extension for Scikit-learn* are not just scikit-learn-compatible, but rather are built atop of scikit-learn itself by inheriting from their classes directly, defining the same attributes that the stock version of scikit-learn would do for each estimator, and reusing most of scikit-learn’s estimator methods where appropriate.

The Extension for Scikit-learn* is regularly tested for API compatibility and for correctness against scikit-learn’s own test suite (see Scikit-learn’s test suite for more information), and can be easily swapped in place of the stock scikit-learn library by patching it.

sklearnex is powered by the oneDAL library that provides accelerations on x86_64 Linux and Windows machines, and silently fall back to vanilla sklearn on unsupported architectures like Apple Silicon and ARM Linux. The package is under the UXL Foundation (a Linux Foundation project) so cross-vendor support is a stated goal.

The FAISS KMeans backend added meaningful installation weight and startup import noise for a marginal benefit. Removing it simplies the backend selection logic to two cases: - cuML if GPU available - else sklearn Changes - Drop `faiss-cpu` & `faiss-gpu-cu12` from main deps and `gpu-*` extras - Remove FAISS from backend scripts `resolve_brackend()`, `run_kmeans()` dispatch - Remove "faiss" from clustering backend dropdowns in the webUI - Update README & BACKEND_PIPELINE doc to reflect the changes

Add `scikit-learn-intelex` as default dependency and patch sklearn at import time in `shared/utils/clustering.py`. Accelerates the existing `sklearn` PCA / TSNE / KMeans calls on CPU. UMAP is unaffected as `umap-learn` is not part of the `sklearn` algorithm. Set Set EMB_EXPLORER_DISABLE_SKLEARNEX=1 to opt out for debugging vanilla sklearn behavior.

Co-authored-by: Net Zhang <48858129+NetZissou@users.noreply.github.com>

egrace479

Footnote formatting fix

egrace479 · 2026-06-03T17:10:18Z

+  │     Backend: cuML (GPU) → sklearn (CPU, auto-accelerated by `sklearn-intelex` [^1])
  │
  ├─► Step 2: Dimensionality Reduction to 2D
  │     Method:  PCA / t-SNE / UMAP
-  │     Backend: cuML → sklearn
+  │     Backend: cuML (GPU) → sklearn (CPU, auto-accelerated by `sklearn-intelex` for PCA/TSNE [^1])


Suggested change

│ Backend: cuML (GPU) → sklearn (CPU, auto-accelerated by `sklearn-intelex` [^1])

│

├─► Step 2: Dimensionality Reduction to 2D

│ Method: PCA / t-SNE / UMAP

│ Backend: cuML → sklearn

│ Backend: cuML (GPU) → sklearn (CPU, auto-accelerated by `sklearn-intelex` for PCA/TSNE [^1])

│ Backend: cuML (GPU) → sklearn (CPU, auto-accelerated by `sklearn-intelex`)

│

├─► Step 2: Dimensionality Reduction to 2D

│ Method: PCA / t-SNE / UMAP

│ Backend: cuML (GPU) → sklearn (CPU, auto-accelerated by `sklearn-intelex` for PCA/TSNE)

Wrong spot for footnotes (they won't function in the codeblock)

egrace479 · 2026-06-03T17:11:42Z

  └─► Scatter Plot (Altair)
        Color = cluster, position = 2D projection
 ```



Suggested change

Note that `sklearn-intelex` acceleration is used for CPU operations where available[^1].

egrace479 · 2026-06-03T17:12:15Z

 | **cuML** | GPU available + >500 samples | GPU-accelerated KMeans via RAPIDS. Runs on CuPy arrays. Falls back to sklearn on any error. |
-| **FAISS** | No GPU + >500 samples | Facebook's optimized CPU KMeans using L2 index. Fast for medium datasets. Falls back to sklearn on error. |
-| **sklearn** | Small datasets or fallback | Standard scikit-learn KMeans. Always works, no special dependencies. |
+| **sklearn** | CPU path (default on machines without a GPU) | Standard scikit-learn KMeans, auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) (Intel oneDAL) when installed — typically 10–17× faster than vanilla sklearn on CPU. Disable with `EMB_EXPLORER_DISABLE_SKLEARNEX=1`. |


Suggested change

| **sklearn** | CPU path (default on machines without a GPU) | Standard scikit-learn KMeans, auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) (Intel oneDAL) when installed — typically 10–17× faster than vanilla sklearn on CPU. Disable with `EMB_EXPLORER_DISABLE_SKLEARNEX=1`. |

| **sklearn** | CPU path (default on machines without a GPU) | Standard scikit-learn KMeans, auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) (Intel oneDAL) when installed[^1] — typically 10–17× faster than vanilla sklearn on CPU. Disable with `EMB_EXPLORER_DISABLE_SKLEARNEX=1`. |

egrace479 · 2026-06-03T17:13:11Z

 |-----------|-----------|
-| KMeans | cuML if GPU + >500 samples, else FAISS if available + >500 samples, else sklearn |
-| Dim. Reduction | cuML if GPU + >5000 samples, else sklearn |
+| KMeans | cuML if GPU + >500 samples, else sklearn (auto-accelerated by `sklearn-intelex` when installed [^1]) |


Suggested change

| KMeans | cuML if GPU + >500 samples, else sklearn (auto-accelerated by `sklearn-intelex` when installed [^1]) |

| KMeans | cuML if GPU + >500 samples, else sklearn (auto-accelerated by `sklearn-intelex` when installed[^1]) |

egrace479 · 2026-06-03T17:13:27Z

 ```

-The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. You can also manually select backends (cuML, FAISS, sklearn) in the sidebar.
+The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. The CPU sklearn path is auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) [^1]. You can also manually select backends (`cuML`, `sklearn`) in the sidebar.


Suggested change

The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. The CPU sklearn path is auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) [^1]. You can also manually select backends (`cuML`, `sklearn`) in the sidebar.

The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. The CPU sklearn path is auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex)[^1]. You can also manually select backends (`cuML`, `sklearn`) in the sidebar.

NetZissou added 2 commits June 2, 2026 10:58

NetZissou self-assigned this Jun 2, 2026

NetZissou added the enhancement New feature or request label Jun 2, 2026

NetZissou linked an issue Jun 2, 2026 that may be closed by this pull request

Backend Acceleration on CPU Cores #32

Open

NetZissou commented Jun 3, 2026

View reviewed changes

Comment thread README.md Outdated

Comment thread docs/BACKEND_PIPELINE.md Outdated

Comment thread docs/BACKEND_PIPELINE.md Outdated

Comment thread docs/BACKEND_PIPELINE.md Outdated

Comment thread docs/BACKEND_PIPELINE.md

NetZissou commented Jun 3, 2026

View reviewed changes

Comment thread README.md

Added footnote for sklearn-intelex in README & BACKEND_PIPELINE

788ec60

Co-authored-by: Net Zhang <48858129+NetZissou@users.noreply.github.com>

NetZissou requested a review from egrace479 June 3, 2026 17:05

egrace479 reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/sklearnex and remove faiss#33

Feature/sklearnex and remove faiss#33
NetZissou wants to merge 3 commits into
mainfrom
feature/sklearnex-and-remove-faiss

NetZissou commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egrace479 left a comment

Uh oh!

egrace479 Jun 3, 2026

Uh oh!

egrace479 Jun 3, 2026

Uh oh!

egrace479 Jun 3, 2026

Uh oh!

egrace479 Jun 3, 2026

Uh oh!

egrace479 Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



	Note that `sklearn-intelex` acceleration is used for CPU operations where available[^1].

	\| sklearn \| CPU path (default on machines without a GPU) \| Standard scikit-learn KMeans, auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) (Intel oneDAL) when installed — typically 10–17× faster than vanilla sklearn on CPU. Disable with `EMB_EXPLORER_DISABLE_SKLEARNEX=1`. \|
	\| sklearn \| CPU path (default on machines without a GPU) \| Standard scikit-learn KMeans, auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) (Intel oneDAL) when installed[^1] — typically 10–17× faster than vanilla sklearn on CPU. Disable with `EMB_EXPLORER_DISABLE_SKLEARNEX=1`. \|

	\| KMeans \| cuML if GPU + >500 samples, else sklearn (auto-accelerated by `sklearn-intelex` when installed [^1]) \|
	\| KMeans \| cuML if GPU + >500 samples, else sklearn (auto-accelerated by `sklearn-intelex` when installed[^1]) \|

	The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. The CPU sklearn path is auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex) [^1]. You can also manually select backends (`cuML`, `sklearn`) in the sidebar.
	The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. The CPU sklearn path is auto-accelerated by [scikit-learn-intelex](https://github.com/uxlfoundation/scikit-learn-intelex)[^1]. You can also manually select backends (`cuML`, `sklearn`) in the sidebar.

Conversation

NetZissou commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

egrace479 left a comment

Choose a reason for hiding this comment

Uh oh!

egrace479 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

egrace479 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

egrace479 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

egrace479 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

egrace479 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NetZissou commented Jun 2, 2026 •

edited

Loading