feat: log record attribute value length limit & attribute count limit#4132
feat: log record attribute value length limit & attribute count limit#4132proost wants to merge 25 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4132 +/- ##
==========================================
- Coverage 82.82% 77.73% -5.09%
==========================================
Files 406 433 +27
Lines 16913 18201 +1288
==========================================
+ Hits 14007 14146 +139
- Misses 2906 4055 +1149
🚀 New features to boost your workflow:
|
| warning_limit: 347 | ||
| - cmake_options: all-options-abiv2-preview | ||
| warning_limit: 350 | ||
| warning_limit: 351 |
There was a problem hiding this comment.
Could you please resolve these warnings but not just ignore them?
And could you please also implement maximum allowed log record attribute count in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#logrecord-limits ?
There was a problem hiding this comment.
Because of SetAttribute. SetAttribute itself is noexcept. problem is It allocate memory. so bad_alloc can happen. current otel doesn't standardize how to handle it. Is ok to swallow it?
There was a problem hiding this comment.
I agree with @owent - we should not just increase the warning limit. SetAttribute() is noexcept, so if an allocation throws here, it would terminate the process.
If allocation fails, I think it is better to drop this attribute/value than crash. I would also avoid logging from this catch block, because the internal log path formats strings and may allocate again.
There was a problem hiding this comment.
When i read spec, duplication of attribute may allowed. Do i have to include the feature in this PR?
| return value; | ||
| } | ||
|
|
||
| return opentelemetry::nostd::string_view(value, limit_); |
There was a problem hiding this comment.
Truncation uses a raw byte count (string_view(value.data(), limit_)), which can split a multi-byte UTF-8 sequence and produce invalid UTF-8. OTLP serializes attributes into protobuf string fields, which require valid UTF-8; strict collectors/parsers may reject or warn.
There was a problem hiding this comment.
Thank you i didn't considered utf-8 string. When i read spec, string value can be invalid utf-8 string. Who is resposibility to utf-8 validation? sdk or user?
There was a problem hiding this comment.
I think what @ThomsonTan means is: when users set their attributes as valid UTF-8 strings, we should not truncate them into invalid UTF-8 strings. For OTLP, invalid UTF-8 strings will be exported as bytes, which changes the field's type and name.
There was a problem hiding this comment.
Thank you, I covered utf8 string
…roost/opentelemetry-cpp into feat-logrecord-attr-value-length-limit
…y-cpp into feat-logrecord-attr-value-length-limit
…ub.com/proost/opentelemetry-cpp into feat-logrecord-attr-value-length-limit
…roost/opentelemetry-cpp into feat-logrecord-attr-value-length-limit
…to feat-logrecord-attr-value-length-limit
…to feat-logrecord-attr-value-length-limit
| // Byte length of the longest prefix of `value` holding at most `limit` UTF-8 code points, | ||
| // never splitting a multi-byte sequence. Malformed lead/continuation bytes are counted as | ||
| // one byte each (we do not validate or repair user-supplied invalid UTF-8). | ||
| static std::size_t Utf8PrefixLength(opentelemetry::nostd::string_view value, |
There was a problem hiding this comment.
In my understanding, if the original input is not a valid UTF-8 string (raw bytes set by user), there is no need to truncate it as UTF-8 string.
What do you think about it?
There was a problem hiding this comment.
Agree.
But if you mean more stricter check, I think we need third party like simdutf or utfcpp to check valid utf8 string. Do you think we need strict validation?
Why i didn't is spec discourage to include third party library.
| try | ||
| { | ||
| #endif | ||
| if (attribute_keys_.size() < limits_.attribute_count_limit) |
There was a problem hiding this comment.
I am worried about putting this limit/count tracking in the base Recordable hot path. With the default count limit, every log record now maintains attribute count state, and every attribute copies the key into a std::string and stores/checks it in an unordered_set before the exporter recordable gets it.
Since this is in Recordable::SetAttribute(), all exporters pay this cost, not only ReadWriteLogRecord or OTLP. I am okay with adding this where it is needed, but putting it in the base Recordable means every exporter pays extra heap allocation and key-tracking cost even if that exporter already has its own storage model or can enforce the limit more cheaply.
Can we move this out of the base Recordable path and let each recordable/exporter enforce the limit in the way that fits its representation?
There was a problem hiding this comment.
Thanks for review.
I designed like this due to enforcement.
I agree your concern. I redesign spec-compatible and more performance aware way And i request review again, when i finish.
…to feat-logrecord-attr-value-length-limit
…runcate) Address three review comments from @lalitb on PR open-telemetry#4157: 1. Store LogRecordLimits by value inside ReadWriteLogRecord and OtlpLogRecordable instead of a raw pointer to a LoggerContext-owned object. A LogRecord is handed out as a unique_ptr, so a record outliving the context that produced it would otherwise dereference a dangling pointer when the user calls SetAttribute() later. The default-constructed value already carries the spec defaults (count=128, length=unlimited), so a fresh recordable enforces the spec count cap from construction, matching the PR's "enforcement" contract. 2. Drop the now-redundant `limits_ != nullptr` short-circuit at every enforcement site (4 in total). This also closes the Codecov-reported uncovered branch in otlp_log_recordable.cc. 3. Truncate UTF-8 string attributes at a code-point boundary instead of a raw byte boundary, so an OTLP protobuf string_value produced by truncation stays valid UTF-8 when the input was. Malformed UTF-8 and trailing lead bytes degrade to plain byte truncation. Logic adapted from open-telemetry#4132 with attribution. While in the same truncation paths, also apply the byte-length cap to raw bytes attributes (`vector<uint8_t>` on the SDK side, AnyValue `bytes_value` on the OTLP side). Both were previously passing through any size, even though the spec applies `attribute_value_length_limit` to bytes attributes as well. Test changes: - Rename DefaultsPassThroughWithoutLimitsObject to DefaultRecordEnforcesSpecCountCap (200 attrs in, 128 stored, 72 dropped) to reflect the new spec-correct default behavior. - Add 3 UTF-8 regression tests on the SDK side (split prevention, exact fit at sequence boundary, malformed input falls back) plus a bytes-truncation test. - Add 3 mirror tests on the OTLP side, a default-cap test, and a bytes-truncation test. Refs: open-telemetry#4126 Co-authored-by: Hyeonho Kim <proost@apache.org> Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
…are truncate) Address three review comments from @lalitb on PR open-telemetry#4157, plus a follow-up from @owent (r3434178971). 1. Store LogRecordLimits by value inside ReadWriteLogRecord and OtlpLogRecordable instead of a raw pointer to a LoggerContext-owned object. A LogRecord is handed out as a unique_ptr, so a record outliving the context that produced it would otherwise dereference a dangling pointer when the user calls SetAttribute() later. The default-constructed value already carries the spec defaults (count=128, length=unlimited), so a fresh recordable enforces the spec count cap from construction, matching the PR's "enforcement" contract. 2. Drop the now-redundant `limits_ != nullptr` short-circuit at every enforcement site (4 in total). This also closes the Codecov-reported uncovered branch in otlp_log_recordable.cc. 3. Truncate OTLP string attributes at a UTF-8 code-point boundary instead of a raw byte boundary, so the protobuf string_value produced by truncation stays valid UTF-8 when the input was. Malformed UTF-8 and trailing lead bytes degrade to plain byte truncation. Logic adapted from open-telemetry#4132 with attribution. The SDK-side ReadWriteLogRecord truncation stays as plain byte cut. The in-memory `OwnedAttributeValue::std::string` variant may legitimately carry raw bytes when constructed from a non-UTF-8 source, so forcing UTF-8 boundary semantics there would over-truncate that case (per @owent's r3434178971, echoing the same point on open-telemetry#4132 r3409677314). Each recordable's truncation strategy now matches its own consumer's wire-format requirement: SDK in-memory has no wire requirement, OTLP protobuf requires valid UTF-8. While in the same truncation paths, also apply the byte-length cap to raw bytes attributes (`vector<uint8_t>` on the SDK side, AnyValue `bytes_value` on the OTLP side). Both were previously passing through any size, even though the spec applies `attribute_value_length_limit` to bytes attributes as well. Test changes: - Rename DefaultsPassThroughWithoutLimitsObject to DefaultRecordEnforcesSpecCountCap (200 attrs in, 128 stored, 72 dropped) to reflect the new spec-correct default behavior. - Add bytes-truncation tests on both SDK and OTLP sides. - Add 3 UTF-8 regression tests on the OTLP side only (split prevention, exact fit at sequence boundary; malformed-fallback omitted since the algorithm's seq=1 fallback for invalid continuations is implementation detail rather than wire contract). - Add a default-cap test on the OTLP side. Refs: open-telemetry#4126 Co-authored-by: Hyeonho Kim <proost@apache.org> Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Fix: #4126
Changes
Adding new feature
OTEL_LOGRECORD_ATTRIBUTE_VALUE_LENGTH_LIMITfollowing spec.I referred below:
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#logrecord-limits
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#logrecord-limits
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute-limits
Following spec, I changed SDK only.
I plan to apply decorator way. Before setter called, I tried to enforce this feature to log record. Problem is this is inevitable to current SDK. If have any suggestions, please give me some feedback.
value length limit benchmark:
count limit benchmark:
baseline:
comparision:
CHANGELOG.mdupdated for non-trivial changes