Skip to content

fix(parser): strip blockquote continuation prefix from table rows#515

Closed
ImmanuelHaffner wants to merge 2 commits into
OXY2DEV:mainfrom
ImmanuelHaffner:fix-blockquote-table-parser
Closed

fix(parser): strip blockquote continuation prefix from table rows#515
ImmanuelHaffner wants to merge 2 commits into
OXY2DEV:mainfrom
ImmanuelHaffner:fix-blockquote-table-parser

Conversation

@ImmanuelHaffner

Copy link
Copy Markdown
Contributor

get_node_text() only applies a tree-sitter node's col_start offset to its first line. For a table nested inside a blockquote, lines 2+ therefore retain the > block-continuation prefix, and the lpeg row parser in parsers/markdown.lua fails silently on them — leaving the separator and data rows unparsed. Symptoms: no alignment is detected, and the separator row leaks into the data-row count, so blockquoted tables render broken/empty.

The fix strips range.col_start characters from lines after the first, and skips empty/blank lines (e.g. trailing > markers).

A fixture (test/blockquote_table.md) is included covering a plain control table, a simple blockquote table, a nested blockquote table, and a blockquote table with alignment markers.

get_node_text() only applies col_start offset to the first line of a
tree-sitter node. For tables inside blockquotes, lines 2+ retain the
'> ' prefix, causing the lpeg row parser to fail silently and produce
empty results. This left separator and data rows unrendered.

Strip the prefix via line:sub(col_start + 1) for lines after the first,
and skip empty/blank lines (e.g. trailing '>' markers).
Repro for the block-continuation prefix fix. With tables nested in
blockquotes, get_node_text() leaves the '> ' prefix on lines 2+, so the
lpeg row parser fails on the separator and data rows. Observable via
markdown.parse: blockquoted tables come back with alignments=0 (separator
unparsed) and the separator leaking into the row count, while the plain
control table parses correctly. This fixture covers a plain control table,
a simple blockquote table, a nested blockquote table, and a blockquote
table with alignment markers.
@OXY2DEV

OXY2DEV commented Jul 3, 2026

Copy link
Copy Markdown
Owner

This is not a bug as text field of the parsed object is meant to represent the output of get_node_text() unless the syntax isn't supported by the parser. Other fields are thus provided to get info on various table cells.

@OXY2DEV OXY2DEV closed this Jul 3, 2026
@ImmanuelHaffner

Copy link
Copy Markdown
Contributor Author

I don't understand that, please elaborate. Is get_node_text() supporsed to carry the blockquote's quotation markers and whitespace? Why? Who relies on that?

@OXY2DEV

OXY2DEV commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Who relies on that?

Basically every markdown parser under the sun. Parsers don't understand what a "block" is. So, a table is just a long line separated by > & newlines.

The parsed table information is already provided so there shouldn't be any need to use the raw text. It's mostly meant to be used internally.

@OXY2DEV

OXY2DEV commented Jul 3, 2026

Copy link
Copy Markdown
Owner

@ImmanuelHaffner Also > are carried over for other block types such as nested block quotes, list items, headings etc.

I don't think there's a need to change it's value only for tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants