Skip to content

fix: keep trailing underscore in expression before close tag#809

Open
spokodev wants to merge 1 commit into
mde:mainfrom
spokodev:fix/slurp-close-identifier-truncation
Open

fix: keep trailing underscore in expression before close tag#809
spokodev wants to merge 1 commit into
mde:mainfrom
spokodev:fix/slurp-close-identifier-truncation

Conversation

@spokodev

Copy link
Copy Markdown

What

An expression whose last token is an identifier ending in _ is truncated when the close tag immediately follows it, because _%> is matched as a substring with no left boundary.

ejs.render('<%= foo_%>   END', { foo: 'WRONG', foo_: 'RIGHT' })
// before: "WRONGEND"  -> read `foo`, then slurped the trailing spaces as `_%>`
// after:  "RIGHT   END"

The spaced form already behaves correctly, which shows the truncation is unintended:

ejs.render('<%= foo_ %>   END', { foo: 'WRONG', foo_: 'RIGHT' })
// "RIGHT   END"  (both before and after this change)

The slurp rule

_%> is the documented whitespace-slurping close tag ("removes all whitespace after it"). It is only a close tag when its leading _ acts as a delimiter prefix. When the _ continues a JavaScript identifier (foo_), it belongs to the expression, not to the close tag. The -%> newline-slurp close is not affected by this because - is not a valid identifier character, so foo-%> is already unambiguous; _ is the one slurp-prefix that is also an identifier character.

Root cause

Two places treat _%> as a close with no left boundary:

  1. The tokenizer alternation in createRegex (...|%>|-%>|_%>). During parseTemplateText, pat.exec finds the leftmost match, so for foo_%> the _%> alternative matches at the _ (index 4) and beats %> (index 5). The expression content becomes foo and the close becomes _%>.
  2. The whitespace pre-pass that textually strips spaces and tabs after a _%> close, which also matched foo_%> and stripped the trailing whitespace.

Fix

Guard the _%> close in both places with a (?<![$\w]) lookbehind so it is only treated as a close when the _ is not preceded by a JS identifier character. The lookbehind is zero-width, so parseTemplateText's index/slice logic is unchanged. In createRegex the guard is added after the delimiter substitution so the </> replacements do not rewrite the lookbehind syntax, which keeps custom delimiters working.

Tests

Added to the existing <%_ and _%> suite in test/ejs.js:

  • <%= foo_%> END now reads foo_ and keeps its whitespace, matching <%= foo_ %> END.
  • A real _%> still slurps: <%_ var x = 1; _%> and <%= 1 _%> are unchanged.

Reverting only the source change makes the new truncation test fail (red) while the slurp-intact test stays green. With the fix, the full suite passes (168 tests), including the custom open/close delimiter test.

An expression ending in an identifier whose last character is `_`
(for example `<%= foo_%>`) was parsed as `foo` plus a `_%>`
whitespace-slurping close, so it read the wrong variable and stripped
the following whitespace. The spaced form `<%= foo_ %>` already keeps
`foo_`, so the no-space form diverged from it.

`_%>` is only a close tag when its leading `_` is a delimiter prefix,
not when it continues a JS identifier. Guard both places that treat
`_%>` as a close (the tokenizer regex and the whitespace pre-pass) with
a `(?<![$\w])` lookbehind, so `foo_%>` now matches `foo_ %>` while the
documented `<%_ ... _%>` and `-%>` slurp behaviors are unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant