Skip to content

file_converter: emit a character-array initialiser#9041

Open
tautschnig wants to merge 1 commit into
diffblue:developfrom
tautschnig:file_converter
Open

file_converter: emit a character-array initialiser#9041
tautschnig wants to merge 1 commit into
diffblue:developfrom
tautschnig:file_converter

Conversation

@tautschnig

Copy link
Copy Markdown
Collaborator

Embed builtin headers as a C character-array initialiser ({ 'a', 'b', ..., 0 }) rather than a string literal. String literals are limited to 65536 characters after concatenation -- a limit clang enforces under -pedantic -- which forced large headers (e.g. the ia32 ones) to be split across several files; a character array has no such limit. The #line directive that the use sites used to prepend as a separate string literal is now baked into the array by file_converter, so each use site is just const char x[] = #include "x.inc";.

This unblocks embedding arbitrarily large generated headers and lets the artificially split headers be recombined (done separately).

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Jun 16, 2026
Copilot AI review requested due to automatic review settings June 16, 2026 20:15

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the builtin-header embedding pipeline to avoid hitting string-literal concatenation limits by generating C/C++ character-array initialisers, and simplifies use sites by baking #line directives into the generated .inc output.

Changes:

  • Update file_converter to emit a { ... , 0 } byte initializer (including a baked-in #line) instead of string literals.
  • Simplify builtin header arrays in ansi_c_internal_additions.cpp by removing the separately-prepended #line string literals.
  • Remove the #line 1 "<built-in-additions>" prefix from the hand-built additions string.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/ansi-c/file_converter.cpp Generates byte-array initializer output and bakes #line into the generated stream.
src/ansi-c/ansi_c_internal_additions.cpp Adjusts builtin header definitions to rely on the new .inc initializer output and removes a #line from the additions string.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to 29
for(const char c : s)
{
const char ch = line[i];
if(ch == '\\')
std::cout << "\\\\";
else if(ch == '"')
std::cout << "\\\"";
else if(ch == '\r' || ch == '\n')
{
}
else if((ch & 0x80) != 0)
{
std::cout << "\\x" << std::hex << (unsigned(ch) & 0xff) << std::dec;
}
const unsigned char ch = static_cast<unsigned char>(c);
if(ch >= 0x80)
std::cout << "(char)" << unsigned(ch) << ',';
else
std::cout << ch;
std::cout << unsigned(ch) << ',';
}
// literals are limited to 65536 characters after concatenation (a limit that
// clang enforces under -pedantic), whereas a character array has no such
// limit. The enclosing `{ ... }` is supplied here.
std::cout << "{";
}

// null terminator and closing brace
std::cout << "0}";
Comment on lines +16 to +20
/// Emit the bytes of \p s as comma-separated character initialisers. Values
/// outside the 7-bit range are cast so the result is valid regardless of
/// whether `char` is signed or unsigned (a brace initialiser would otherwise
/// reject the narrowing conversion).
static void emit_bytes(const std::string &s)
Comment on lines 129 to 132
// clang-format off
// do the built-in types and variables
code+=
"#line 1 \"<built-in-additions>\"\n"
"typedef __typeof__(sizeof(int)) " CPROVER_PREFIX "size_t;\n"
Comment on lines +71 to +72
// null terminator and closing brace
std::cout << "0}";
Embed builtin headers as a C character-array initialiser ({ 'a', 'b', ..., 0 })
rather than a string literal. String literals are limited to 65536 characters
after concatenation -- a limit clang enforces under -pedantic -- which forced
large headers (e.g. the ia32 ones) to be split across several files; a
character array has no such limit. This unblocks embedding arbitrarily large
generated headers and lets the artificially split headers be recombined (done
separately).

The #line directive that the compiler-header use sites used to prepend as a
separate string literal can no longer be concatenated with a brace initialiser,
so file_converter bakes it into the array instead, enabled via a --line flag;
each such use site is then just `const char x[] = #include "x.inc";`. The flag
is opt-in because the memory-analyzer unit test streams its .inc into a
temporary file (of << #include ...) and needs the verbatim contents without a
#line prefix -- that use site now assigns the array to a const char[] before
streaming it.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@tautschnig tautschnig requested a review from a team as a code owner June 17, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants