Update multimodal docs with new models#4148
Conversation
0ca2ca6 to
f60719b
Compare
Rohan-Bierneni
left a comment
There was a problem hiding this comment.
LGTM! Can we also add a link in
that has a link to this multimodal doc for qwen3.5 sectionc64be68 to
eb71c23
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Rohan-Bierneni
left a comment
There was a problem hiding this comment.
Thank you for the change I have left some comments
21e47dd to
193f52d
Compare
Rohan-Bierneni
left a comment
There was a problem hiding this comment.
The overall script are formatted very good split between part 1 and 2. Mainly we need some changes to script 2 to make it compatible with our xlml tests.
We also need another pr in xlml once this is merged to add the tests in our DAG.
Thank you for making these changes!
| fi | ||
|
|
||
| # --- | ||
| # Step 1: Checkpoint Conversion |
There was a problem hiding this comment.
Can you also add the command to convert to scanned checkpoint below.
If scanned checkpoitn will not be used/is not supported in the checkpoint util then no worries we can skip it
There was a problem hiding this comment.
Due to the deepstack features in the vision encoder, Omni doesn't have a full scanned support. We can skip it and use unscanned for now.
| export TOKENIZER_PATH="${TOKENIZER_PATH:-Qwen/Qwen3-Omni-30B-A3B-Instruct}" | ||
|
|
||
| # Base output path where the MaxText checkpoint from Step 1 was written. | ||
| export BASE_OUTPUT_PATH="${BASE_OUTPUT_PATH:-gs://your-gcs-bucket/qwen3-omni-30b-a3b_maxtext_ckpt}" |
There was a problem hiding this comment.
Since this will be used by xlml for tests, the default value should output to gs://runner-maxtext-logs/$(date +%Y-%m-%d-%H-%M).
Similar to how it is set here:
There was a problem hiding this comment.
Good call, done.
| BASE_OUTPUT_PATH=${BASE_OUTPUT_PATH%/} | ||
| echo "Using BASE_OUTPUT_PATH = ${BASE_OUTPUT_PATH}" | ||
|
|
||
| UNSCANNED_CKPT_PATH=${BASE_OUTPUT_PATH}/unscanned/0/items |
There was a problem hiding this comment.
XLML will pull an already converted checkpoint from gs://maxtext-model-checkpoints.
Can you upload a converted checkpoint to this gcs bucket, and set the value here to point to that checkpoint.
There was a problem hiding this comment.
Uploaded a new checkpoint to gs://maxtext-model-checkpoints/qwen3-omni-30b-a3b/unscanned/0/items.
| exit 1 | ||
| fi | ||
|
|
||
| # Strip trailing slash from base path to avoid malformed URIs |
There was a problem hiding this comment.
Should be able to remove this logic if using hardcoded gcs bucket gs://runner-maxtext-logs
There was a problem hiding this comment.
Removed this redundant block.
| UNSCANNED_CKPT_PATH=${BASE_OUTPUT_PATH}/unscanned/0/items | ||
|
|
||
| # --- | ||
| # Step 2a: Multimodal Decode — text + image |
There was a problem hiding this comment.
Before the decode tests, do you think we should have forward_pass_logit_check test as well. This would compare to hf golden logits and would need to be stored in specific gcs bucket.
Ex:
There was a problem hiding this comment.
That would be great but vision branch may undergo some precision issue, as we translated image processing steps from Torch to JAX, and some interpolation operations cannot closely match. So now we mostly look at whether the output are making sense. I will do that as a follow up.
| # Uses a test image from the repo assets. | ||
| # max_prefill_predict_length accounts for image tokens (~256) + text prompt tokens. | ||
| # --- | ||
| python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \ |
There was a problem hiding this comment.
The way we point to base.yml file currently in toher scripts is like this:
${MAXTEXT_CONFIGS_DIR:-${MAXTEXT_REPO_ROOT:-$PWD}/src/maxtext/configs}/base.yml
Can we change it to this for other instances in this script for consistency.
There was a problem hiding this comment.
Thanks for the catch, all base.yml paths have been updated.
c8af8f2 to
b93eda4
Compare
Description
multimodal.mdto include new models supported over the past few months.Qwen3-Omniannounce in the qwen_moe doc, including E2E test scripts.Tests
Build docs for readthedocs:
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.