Skip to content

fix: Do not escape regex in d2:validatePattern [DHIS2-21359]#100

Merged
enricocolasante merged 4 commits into
mainfrom
DHIS2-21359
Apr 29, 2026
Merged

fix: Do not escape regex in d2:validatePattern [DHIS2-21359]#100
enricocolasante merged 4 commits into
mainfrom
DHIS2-21359

Conversation

@enricocolasante
Copy link
Copy Markdown
Collaborator

@enricocolasante enricocolasante commented Apr 23, 2026

Problem

Two bugs combined to break d2:validatePattern, one affecting all platforms and one affecting
Kotlin/JS only.

Bug 1 — pattern was string-decoded before reaching the regex engine (all platforms)

The regex pattern argument was evaluated with evalToString, which applies full expression-string
decoding: every backslash escape is resolved as if it were a string literal. This silently corrupted
the pattern passed to the regex engine:

Pattern as written After evalToString Effect
\d d digit class lost, matches literal d only
\w w word class lost
\- inside [...] - hyphen becomes a range operator, creating unintended ranges

The fix is to use evalToRawString, which returns the node's raw value without applying
string-escape decoding, so the regex engine receives the pattern as the author intended.

Bug 2 — raw pattern rejected by the JS regex engine (Kotlin/JS only)

After fixing Bug 1, the raw expression-string value reaches the JS regex engine intact. The raw
value preserves expression-language escapes that have no equivalent in the regex spec, such as \',
\`, and \ . The Kotlin stdlib Regex wrapper applies the JavaScript Unicode mode
(u flag) unconditionally. Under Unicode mode, ECMAScript withdraws the Annex B leniency rules
and permits only a strict subset of backslash escapes, causing any unknown one to throw a
SyntaxError at construction time.

The relevant test case illustrates this — the raw pattern string reaching the engine is:

[a-zA-Z0-9À-ȕ\'\-\'\`\'\ ]+
Escape Origin JVM (java.util.regex) JS — no u flag JS — u flag (Kotlin stdlib)
\' Expression-language quote escape ✅ identity escape → ' ✅ ECMAScript Annex B → ' SyntaxError — not a syntax character
\- Literal hyphen in character class ✅ literal - ✅ literal - ✅ explicit ClassEscape in Unicode mode
\` Expression-language backtick escape ✅ identity escape → ` ✅ ECMAScript Annex B → ` SyntaxError — not a syntax character
\ Expression-language space escape ✅ identity escape → ✅ ECMAScript Annex B → SyntaxError — not a syntax character

There is no API knob on the Kotlin stdlib Regex to suppress the u flag on the JS target.

Solution

Bug 1 is fixed by introducing evalToRawString in Calculator and routing the pattern
argument of d2:validatePattern through it instead of evalToString.

Bug 2 is fixed by introducing a matchesPattern expect/actual function:

  • commonMain — declares the expect
  • jvmMain / nativeMain — delegates to input.matches(pattern.toRegex()) unchanged
  • jsMain — constructs the RegExp directly via js(...), bypassing the stdlib wrapper and
    therefore the u flag; wraps the pattern in ^(?:...)$ to replicate Kotlin's full-string
    matching semantics

d2_validatePattern now calls matchesPattern instead of input.matches(regex.toRegex()).

Trade-off

Running without the u flag means Unicode property escapes (\p{Lu} etc.) are not interpreted
by the JS engine — the pattern is accepted without error but \p{Lu} behaves as the literal
string p{Lu}. For DHIS2's validation patterns (digit ranges, character sets, simple anchors)
this is not a concern in practice, and the JVM target continues to interpret them correctly via
Java's Pattern.

@enricocolasante enricocolasante force-pushed the DHIS2-21359 branch 2 times, most recently from 11ea7dc to aafe0d0 Compare April 23, 2026 17:48
@enricocolasante enricocolasante marked this pull request as ready for review April 24, 2026 06:59
@enricocolasante enricocolasante requested a review from jbee April 24, 2026 06:59
Copy link
Copy Markdown
Collaborator

@jbee jbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look an this looks to me as if we are addressing the issue on the wrong level. The semantics of the string should be transparent until the point you e.g. use it as a pattern for a RegEx. But that does not seemed to be the case with what the adjustment does. Also it seems strange to have a set of escapes that are kept while others are stripped. To me this says the handling (character decoding?) on a lower level is off and needs correcting so the output behaves correctly when used later as a regex. If we would apply a fix like this we move to this to a place where by definition it has semantics that support some escaping but not others which generally would be allowed in a standard regex. At last that is what I understand from looking at it. I think this needs more discussion.

@enricocolasante
Copy link
Copy Markdown
Collaborator Author

I had a quick look an this looks to me as if we are addressing the issue on the wrong level. The semantics of the string should be transparent until the point you e.g. use it as a pattern for a RegEx. But that does not seemed to be the case with what the adjustment does. Also it seems strange to have a set of escapes that are kept while others are stripped. To me this says the handling (character decoding?) on a lower level is off and needs correcting so the output behaves correctly when used later as a regex. If we would apply a fix like this we move to this to a place where by definition it has semantics that support some escaping but not others which generally would be allowed in a standard regex. At last that is what I understand from looking at it. I think this needs more discussion.

That solution was a little bit hacky and it was not clearly addressing the issue.
Now in the description it should be more clear what are the issues that we are trying to solve.
The main problem here is that Kotlin jvm and Js are not perfectly aligned on some implementation details (Regex pattern matching by default is using different modes in jvm and js).

@enricocolasante enricocolasante requested a review from jbee April 27, 2026 08:17
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 27, 2026

@enricocolasante enricocolasante merged commit 181056d into main Apr 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants