file_converter: emit a character-array initialiser#9041
Open
tautschnig wants to merge 1 commit into
Open
Conversation
7 tasks
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the builtin-header embedding pipeline to avoid hitting string-literal concatenation limits by generating C/C++ character-array initialisers, and simplifies use sites by baking #line directives into the generated .inc output.
Changes:
- Update
file_converterto emit a{ ... , 0 }byte initializer (including a baked-in#line) instead of string literals. - Simplify builtin header arrays in
ansi_c_internal_additions.cppby removing the separately-prepended#linestring literals. - Remove the
#line 1 "<built-in-additions>"prefix from the hand-built additions string.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/ansi-c/file_converter.cpp | Generates byte-array initializer output and bakes #line into the generated stream. |
| src/ansi-c/ansi_c_internal_additions.cpp | Adjusts builtin header definitions to rely on the new .inc initializer output and removes a #line from the additions string. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+22
to
29
| for(const char c : s) | ||
| { | ||
| const char ch = line[i]; | ||
| if(ch == '\\') | ||
| std::cout << "\\\\"; | ||
| else if(ch == '"') | ||
| std::cout << "\\\""; | ||
| else if(ch == '\r' || ch == '\n') | ||
| { | ||
| } | ||
| else if((ch & 0x80) != 0) | ||
| { | ||
| std::cout << "\\x" << std::hex << (unsigned(ch) & 0xff) << std::dec; | ||
| } | ||
| const unsigned char ch = static_cast<unsigned char>(c); | ||
| if(ch >= 0x80) | ||
| std::cout << "(char)" << unsigned(ch) << ','; | ||
| else | ||
| std::cout << ch; | ||
| std::cout << unsigned(ch) << ','; | ||
| } |
| // literals are limited to 65536 characters after concatenation (a limit that | ||
| // clang enforces under -pedantic), whereas a character array has no such | ||
| // limit. The enclosing `{ ... }` is supplied here. | ||
| std::cout << "{"; |
| } | ||
|
|
||
| // null terminator and closing brace | ||
| std::cout << "0}"; |
Comment on lines
+16
to
+20
| /// Emit the bytes of \p s as comma-separated character initialisers. Values | ||
| /// outside the 7-bit range are cast so the result is valid regardless of | ||
| /// whether `char` is signed or unsigned (a brace initialiser would otherwise | ||
| /// reject the narrowing conversion). | ||
| static void emit_bytes(const std::string &s) |
Comment on lines
129
to
132
| // clang-format off | ||
| // do the built-in types and variables | ||
| code+= | ||
| "#line 1 \"<built-in-additions>\"\n" | ||
| "typedef __typeof__(sizeof(int)) " CPROVER_PREFIX "size_t;\n" |
Comment on lines
+71
to
+72
| // null terminator and closing brace | ||
| std::cout << "0}"; |
Embed builtin headers as a C character-array initialiser ({ 'a', 'b', ..., 0 })
rather than a string literal. String literals are limited to 65536 characters
after concatenation -- a limit clang enforces under -pedantic -- which forced
large headers (e.g. the ia32 ones) to be split across several files; a
character array has no such limit. This unblocks embedding arbitrarily large
generated headers and lets the artificially split headers be recombined (done
separately).
The #line directive that the compiler-header use sites used to prepend as a
separate string literal can no longer be concatenated with a brace initialiser,
so file_converter bakes it into the array instead, enabled via a --line flag;
each such use site is then just `const char x[] = #include "x.inc";`. The flag
is opt-in because the memory-analyzer unit test streams its .inc into a
temporary file (of << #include ...) and needs the verbatim contents without a
#line prefix -- that use site now assigns the array to a const char[] before
streaming it.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
8e9ce7a to
fe5d3d0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Embed builtin headers as a C character-array initialiser ({ 'a', 'b', ..., 0 }) rather than a string literal. String literals are limited to 65536 characters after concatenation -- a limit clang enforces under -pedantic -- which forced large headers (e.g. the ia32 ones) to be split across several files; a character array has no such limit. The #line directive that the use sites used to prepend as a separate string literal is now baked into the array by file_converter, so each use site is just
const char x[] = #include "x.inc";.This unblocks embedding arbitrarily large generated headers and lets the artificially split headers be recombined (done separately).