Update multimodal docs with new models by hengtaoguo · Pull Request #4148 · AI-Hypercomputer/maxtext

hengtaoguo · 2026-06-11T21:05:01Z

Description

Update multimodal.md to include new models supported over the past few months.
Add Qwen3-Omni announce in the qwen_moe doc, including E2E test scripts.

Tests

# Build:
cd docs && sphinx-build -b html . _build/html

# Serve:
python -m http.server -d _build/html

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Rohan-Bierneni

LGTM! Can we also add a link in

maxtext/tests/end_to_end/tpu/qwen/moe/run_qwen_moe.md

Line 4 in be7748b

    
           Qwen3 is a family of open-source large language models from the Qwen team at Alibaba. This documentation covers the integration of the following Qwen Mixture-of-Experts (MoE) models into MaxText:

that has a link to this multimodal doc for qwen3.5 section

codecov · 2026-06-11T23:35:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Rohan-Bierneni

Thank you for the change I have left some comments

Rohan-Bierneni

The overall script are formatted very good split between part 1 and 2. Mainly we need some changes to script 2 to make it compatible with our xlml tests.

We also need another pr in xlml once this is merged to add the tests in our DAG.

Thank you for making these changes!

Rohan-Bierneni · 2026-06-12T23:11:24Z

+fi
+
+# ---
+# Step 1: Checkpoint Conversion


Can you also add the command to convert to scanned checkpoint below.

If scanned checkpoitn will not be used/is not supported in the checkpoint util then no worries we can skip it

Due to the deepstack features in the vision encoder, Omni doesn't have a full scanned support. We can skip it and use unscanned for now.

Rohan-Bierneni · 2026-06-12T23:14:44Z

+export TOKENIZER_PATH="${TOKENIZER_PATH:-Qwen/Qwen3-Omni-30B-A3B-Instruct}"
+
+# Base output path where the MaxText checkpoint from Step 1 was written.
+export BASE_OUTPUT_PATH="${BASE_OUTPUT_PATH:-gs://your-gcs-bucket/qwen3-omni-30b-a3b_maxtext_ckpt}"


Since this will be used by xlml for tests, the default value should output to gs://runner-maxtext-logs/$(date +%Y-%m-%d-%H-%M).

Similar to how it is set here:

maxtext/tests/end_to_end/tpu/qwen/moe/qwen3.5-35b-a3b/2_test_qwen3.5_35b_a3b.sh

Line 22 in 3190805

if [ -z "${BASE_OUTPUT_PATH}" ]; then

Good call, done.

Rohan-Bierneni · 2026-06-12T23:16:22Z

+BASE_OUTPUT_PATH=${BASE_OUTPUT_PATH%/}
+echo "Using BASE_OUTPUT_PATH = ${BASE_OUTPUT_PATH}"
+
+UNSCANNED_CKPT_PATH=${BASE_OUTPUT_PATH}/unscanned/0/items


XLML will pull an already converted checkpoint from gs://maxtext-model-checkpoints.

Can you upload a converted checkpoint to this gcs bucket, and set the value here to point to that checkpoint.

Uploaded a new checkpoint to gs://maxtext-model-checkpoints/qwen3-omni-30b-a3b/unscanned/0/items.

Rohan-Bierneni · 2026-06-12T23:16:49Z

+  exit 1
+fi
+
+# Strip trailing slash from base path to avoid malformed URIs


Should be able to remove this logic if using hardcoded gcs bucket gs://runner-maxtext-logs

Removed this redundant block.

Rohan-Bierneni · 2026-06-12T23:19:07Z

+UNSCANNED_CKPT_PATH=${BASE_OUTPUT_PATH}/unscanned/0/items
+
+# ---
+# Step 2a: Multimodal Decode — text + image


Before the decode tests, do you think we should have forward_pass_logit_check test as well. This would compare to hf golden logits and would need to be stored in specific gcs bucket.

Ex:

maxtext/tests/end_to_end/tpu/qwen/moe/qwen3.5-35b-a3b/2_test_qwen3.5_35b_a3b.sh

Line 43 in 3190805

GOLDEN_LOGITS_DISK_LOCATION="/deps/tests/assets/golden_logits/golden_data_${MODEL_NAME}.jsonl"

That would be great but vision branch may undergo some precision issue, as we translated image processing steps from Torch to JAX, and some interpolation operations cannot closely match. So now we mostly look at whether the output are making sense. I will do that as a follow up.

Rohan-Bierneni · 2026-06-12T23:20:41Z

+# Uses a test image from the repo assets.
+# max_prefill_predict_length accounts for image tokens (~256) + text prompt tokens.
+# ---
+python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \


The way we point to base.yml file currently in toher scripts is like this:

${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs}/base.yml

Can we change it to this for other instances in this script for consistency.

Thanks for the catch, all base.yml paths have been updated.

hengtaoguo marked this pull request as ready for review June 11, 2026 21:05

hengtaoguo requested review from A9isha, RissyRan, SurbhiJainUSC, bvandermoon, darisoy, gagika, gobbleturk, jacoguzo, jiangjy1982, richjames0, shralex and vipannalla as code owners June 11, 2026 21:05

hengtaoguo force-pushed the hengtaoguo-doc branch from 0ca2ca6 to f60719b Compare June 11, 2026 21:11

aireenmei approved these changes Jun 11, 2026

View reviewed changes

Rohan-Bierneni approved these changes Jun 11, 2026

View reviewed changes

hengtaoguo requested review from NicoGrande, NuojCheng, abhinavclemson, dipannita08, igorts-git, jesselu-google, khatwanimohit and suexu1025 as code owners June 11, 2026 23:28

hengtaoguo force-pushed the hengtaoguo-doc branch 2 times, most recently from c64be68 to eb71c23 Compare June 11, 2026 23:30

Rohan-Bierneni reviewed Jun 11, 2026

View reviewed changes

Comment thread tests/end_to_end/tpu/qwen/moe/run_qwen_moe.md

Rohan-Bierneni reviewed Jun 11, 2026

View reviewed changes

Comment thread tests/end_to_end/tpu/qwen/moe/qwen3-omni-30b-a3b/1_test_qwen3_omni_30b_a3b.sh Outdated

Rohan-Bierneni reviewed Jun 12, 2026

View reviewed changes

hengtaoguo force-pushed the hengtaoguo-doc branch from 21e47dd to 193f52d Compare June 12, 2026 20:22

Rohan-Bierneni requested changes Jun 12, 2026

View reviewed changes

Update multimodal docs with new models

b93eda4

hengtaoguo force-pushed the hengtaoguo-doc branch from c8af8f2 to b93eda4 Compare June 13, 2026 00:15

Conversation

hengtaoguo commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Rohan-Bierneni left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 11, 2026

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Rohan-Bierneni left a comment

Choose a reason for hiding this comment

Uh oh!

Rohan-Bierneni left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hengtaoguo commented Jun 11, 2026 •

edited

Loading