refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation#13196
Conversation
- Replace C-style file I/O (fopen, fprintf, fclose) with std::ofstream for RAII-based file handling - Use fixed-width integer types (uint32_t, uint8_t) instead of unsigned int and char for portability and clarity - Refactor uint_to_binary function to use std::string instead of static buffer for thread safety - Add comprehensive Doxygen documentation for the file, functions, and arrays - Remove @section license License from file header to prevent Doxygen warnings about multiple use of section label, as @section is intended for major structured documentation sections, not repetitive boilerplate - Remove obsolete COMPILE_PARSE_RULES macro and ink_string.h dependency - Improve output formatting using std::setw, std::setfill, and std::hex for consistent alignment - Replace int loop variables with uint16_t for better type safety - Add static_cast for explicit type conversions
282d691 to
fbd054e
Compare
- Replace `1` with `1U` in bit shift operation to prevent undefined behaviour when shifting into sign bit of signed integer - Shifting a signed integer (e.g., `1 << 31`) into its sign bit invokes undefined behaviour per the C++ standard - Using an unsigned literal (`1U`) ensures well-defined behaviour for all shift amounts
…d behaviour - Change `char cc` to `unsigned char cc` to ensure consistent handling of byte values (0-255) - `char` can be signed or unsigned depending on the platform, causing implementation-defined behaviour for values > 127 - Using `unsigned char` guarantees correct interpretation of all byte values
- Rename functions to PascalCase (uint_to_binary → UintToBinary) - Add `g` prefix to global variables (example: parseRulesCType → gParseRulesCType) - Rename ParseRules classification functions and constants to PascalCase (is_* → Is*, *_BIT → IS_*) - Add file existence checks before writing with error messages to stderr - Add `#include <iostream>` for std::cerr - Rename `fp` to `outputFile` for improved clarity - Change `char` to `unsigned char` for byte value handling - Update all Doxygen comments to reflect new naming conventions
- Replace `index` with `i` to match loop variable - Replace `currentChar` with `cc` to match variable declaration - Use PascalCase for ParseRules functions (ink_tolower → InkTolower, ink_toupper → InkToupper) - Use PascalCase for UintToBinary function - Replace `fp` with `outputFile` in last file writing block for consistency
fbd054e to
7a47a4f
Compare
- Split file header into separate C-style comment block for license and Doxygen block for documentation - Required to pass Apache Release Audit Tool (Rat) - Fix formatting: indent Apache license URL for consistency - Fix formatting: remove extra space before "Fixed-width integer types"
7a47a4f to
2ce2079
Compare
|
Hi, I noticed this PR is failing the Jenkins AuTest stages. I investigated the failure, and it is not related to my code changes. The The tarball hosted at Could a maintainer please update line 44 in Thank you! |
Incorporate latest upstream changes from master to resolve potential merge conflicts and ensure feature branch stays current with the project's master branch.
|
[approve ci] |
There was a problem hiding this comment.
Pull request overview
This PR refactors the src/tscore/CompileParseRules.cc build-time generator that produces the ParseRulesCType* lookup table include files used by src/tscore/ParseRules.cc, aiming to improve type-safety and modernize implementation (C++ I/O, fixed-width types, thread-safety).
Changes:
- Reworked table generation logic and types (e.g.,
uint32_t/uint8_t) and updated the binary formatting helper to returnstd::string. - Switched file output from
FILE*/fprintftostd::ofstreamwith iostream formatting. - Added extensive Doxygen documentation and reorganized includes.
| for (uint16_t i = 0; i < 256; i++) { | ||
| gTparseRulesCType[i] = 0; | ||
| gTparseRulesCTypeToLower[i] = static_cast<uint8_t>(ParseRules::InkTolower(i)); | ||
| gTparseRulesCTypeToUpper[i] = static_cast<uint8_t>(ParseRules::InkToupper(i)); | ||
|
|
||
| if (ParseRules::is_char(c)) { | ||
| tparseRulesCType[c] |= is_char_BIT; | ||
| if (ParseRules::IsChar(i)) { | ||
| gTparseRulesCType[i] |= IS_CHAR_BIT; | ||
| } |
| unsigned char cc = static_cast<unsigned char>(i); | ||
|
|
||
| if (ParseRules::IsPchar(&cc)) { | ||
| gTparseRulesCType[i] |= IS_PCHAR_BIT; | ||
| } |
| for (uint16_t i = 0; i < 256; ++i) { | ||
| outputFile << "/* " << std::setw(3) << i << " (" << (isprint(i) ? static_cast<char>(i) : '?') << ") */\t"; | ||
| outputFile << "0x" << std::hex << std::setw(8) << std::setfill('0') << gTparseRulesCType[i] << (i != 255 ? ",\t\t" : "\t\t"); | ||
| outputFile << "/* [" << UintToBinary(gTparseRulesCType[i]) << "] */\n"; |
| #include <cstdio> | ||
| #include <cctype> |
| #include <string> | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| #include "tscore/ParseRules.h" |
| return 1; | ||
| } | ||
| for (uint16_t i = 0; i < 256; ++i) { | ||
| outputFile << "(uint8_t)" << static_cast<unsigned>(gTparseRulesCTypeToUpper[i]) << (i != 255 ? ',' : ' ') << '\n'; |
| return 1; | ||
| } | ||
| for (uint16_t i = 0; i < 256; ++i) { | ||
| outputFile << "(uint8_t)" << static_cast<unsigned>(gTparseRulesCTypeToLower[i]) << (i != 255 ? ',' : ' ') << '\n'; |
|
[approve ci autest 2] |
|
Hi, The ../tscore/include/ParseRules.h and ../tscore/ParseRules.cc are currently in my in tray to bring up the code quality to meet modern C++ standards and Mozilla style guide if followed. I've noticed a mix of C style code and C++ code, so I changed that to be idiomatic C++ code. Regards Graham |
📌 Summary
This PR refactors
src/tscore/CompileParseRules.ccto address compilation errors, type safety issues, modern C++ best practices, and documentation gaps. The changes ensure compatibility across 32-bit and 64-bit targets, fix narrowing conversion warnings, and improve code clarity and maintainability.✅ Problems Addressed
charis signed on x86, causing negative values (e.g.,-128) for ASCII >127, which cannot be narrowed tounsigned charon 32-bit targets.uint8_tfor character arrays and explicitly cast tounsignedwhen writing to files.#include "tscore/ink_string.h"is included but never used.uint_to_binarystatic char buf[33], making it unsafe for concurrent use.std::string(RAII-managed, thread-safe).FILE*andfprintf, which are less type-safe and require manual resource management.std::ofstream(RAII-based, type-safe).@section licensecauses warnings due to duplicate labels across files.@sectionand kept the license text directly in the file header.#define COMPILE_PARSE_RULESis defined but never used.unsigned intandcharinstead of fixed-width types (uint8_t,uint32_t).<cstdint>types for portability and clarity.🔧 Key Changes
1. File Header & Includes
#define COMPILE_PARSE_RULES,#include "tscore/ink_string.h".<cstdint>,<fstream>,<iomanip>,<string>.2. Data Types
unsigned inttouint32_tfor explicit size and future-proofing.chartouint8_tto guarantee an unsigned range (0-255) and avoid sign-extension issues.inttouint16_toruint8_tfor semantic clarity.3.
uint_to_binaryFunctionstatic char buf[33](non-thread-safe).std::string(thread-safe, RAII-managed).unsigned inttouint32_tfor explicit size.4. File I/O
FILE*andfprintf(C-style, manual resource management).std::ofstream(RAII-based, type-safe, modern C++).%dwithstatic_cast<unsigned>to avoid sign-extension issues when printinguint8_tvalues.5. Doxygen Documentation
parseRulesCType,tparseRulesCType, etc.).uint_to_binaryfunction.mainfunction (including its steps and classification functions).🧪 Testing
char) and ARM (unsignedchar) targets.ParseRulesCType,ParseRulesCTypeToUpper,ParseRulesCTypeToLower) contain correct values (no negative numbers for ASCII >127).uint_to_binaryis now thread-safe.