fix Gemma 4 multimodal chat-template markers in processor_gemma4#4158
Merged
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
01d9a51 to
6462021
Compare
The Gemma 4 multimodal SFT path was emitting Gemma 3 chat-template markers
("<start_of_turn>", "<end_of_turn>") which are NOT special tokens in the
Gemma 4 tokenizer. They BPE-tokenize into 7-token noise sequences each, so a
training label like "A<end_of_turn>" became an 8-token sequence
([236776 'A', 236820 '<', 643 'end', 236779 '_', 1340 'of', 236779 '_',
887 'turn', 236813 '>']).
With sft_train_on_completion_only=true the model learned to reproduce this
noise sequence after every answer, producing severe response-format collapse
post-SFT (e.g. "A<B<C<D<...").
The Gemma 4 chat template uses different special tokens:
<bos> (id 2)
<|turn> (id 105)
<turn|> (id 106)
This CL switches the prompt and response formatters to use them.
PiperOrigin-RevId: 931396545
6462021 to
5f3dc2b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix Gemma 4 multimodal chat-template markers in processor_gemma4
The Gemma 4 multimodal SFT path was emitting Gemma 3 chat-template markers
("<start_of_turn>", "<end_of_turn>") which are NOT special tokens in the
Gemma 4 tokenizer. They BPE-tokenize into 7-token noise sequences each, so a
training label like "A<end_of_turn>" became an 8-token sequence
([236776 'A', 236820 '<', 643 'end', 236779 '', 1340 'of', 236779 '',
887 'turn', 236813 '>']).
With sft_train_on_completion_only=true the model learned to reproduce this
noise sequence after every answer, producing severe response-format collapse
post-SFT (e.g. "A<B<C<D<...").
The Gemma 4 chat template uses different special tokens:
(id 2)
<|turn> (id 105)
<turn|> (id 106)
This CL switches the prompt and response formatters to use them.