Skip to content

feat: log record attribute value length limit & attribute count limit#4132

Closed
proost wants to merge 25 commits into
open-telemetry:mainfrom
proost:feat-logrecord-attr-value-length-limit
Closed

feat: log record attribute value length limit & attribute count limit#4132
proost wants to merge 25 commits into
open-telemetry:mainfrom
proost:feat-logrecord-attr-value-length-limit

Conversation

@proost

@proost proost commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Fix: #4126

Changes

Adding new feature OTEL_LOGRECORD_ATTRIBUTE_VALUE_LENGTH_LIMIT following spec.

I referred below:
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#logrecord-limits
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#logrecord-limits
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute-limits

Following spec, I changed SDK only.

I plan to apply decorator way. Before setter called, I tried to enforce this feature to log record. Problem is this is inevitable to current SDK. If have any suggestions, please give me some feedback.

value length limit benchmark:

-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
BM_SetStringAttributeNoLimit                        129 ns          125 ns      5517767 bytes_per_second=15.2404Gi/s
BM_SetStringAttributeLimitNotHit                    123 ns          122 ns      5487501 bytes_per_second=15.6172Gi/s
BM_SetStringAttributeSmallLimitHit                 64.3 ns         64.3 ns     10916371 bytes_per_second=29.6844Gi/s
BM_SetStringAttributeLargeLimitHit                 93.3 ns         93.3 ns      7411721 bytes_per_second=20.4478Gi/s
BM_SetStringArrayAttributeShortArrayLimitHit        130 ns          130 ns      5384464 items_per_second=30.7945M/s
BM_SetStringArrayAttributeLongArrayLimitHit         909 ns          909 ns       774217 items_per_second=70.4199M/s

count limit benchmark:

baseline:

-------------------------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------
BM_SetDistinctAttributesUnderLimit/8          439 ns          439 ns      1592935 items_per_second=18.2267M/s
BM_SetDistinctAttributesUnderLimit/16         909 ns          909 ns       759657 items_per_second=17.6055M/s
BM_SetDistinctAttributesUnderLimit/32        2483 ns         2483 ns       266050 items_per_second=12.8874M/s
BM_SetDistinctAttributesUnderLimit/64        5378 ns         5378 ns       130136 items_per_second=11.9003M/s
BM_SetDistinctAttributesUnderLimit/128      11627 ns        11626 ns        60203 items_per_second=11.0098M/s

comparision:

BM_SetDistinctAttributesUnderLimit/8          761 ns          761 ns       917233 items_per_second=10.519M/s
BM_SetDistinctAttributesUnderLimit/16        1721 ns         1721 ns       413115 items_per_second=9.29961M/s
BM_SetDistinctAttributesUnderLimit/32        4365 ns         4364 ns       161415 items_per_second=7.33292M/s
BM_SetDistinctAttributesUnderLimit/64        9522 ns         9519 ns        73502 items_per_second=6.72351M/s
BM_SetDistinctAttributesUnderLimit/128      28387 ns        28386 ns        24260 items_per_second=4.50921M/s
BM_SetDistinctAttributesOverLimit/8           828 ns          828 ns       882998 items_per_second=9.65847M/s
BM_SetDistinctAttributesOverLimit/16          908 ns          908 ns       755289 items_per_second=17.6222M/s
BM_SetDistinctAttributesOverLimit/32         1548 ns         1543 ns       505904 items_per_second=20.7436M/s
BM_SetDistinctAttributesOverLimit/64         2292 ns         2292 ns       304527 items_per_second=27.9197M/s
BM_SetDistinctAttributesOverLimit/128        4194 ns         4193 ns       166171 items_per_second=30.5246M/s
  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@proost proost requested a review from a team as a code owner June 6, 2026 08:40
@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 63.81910% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.73%. Comparing base (17c4723) to head (c1619cf).

Files with missing lines Patch % Lines
sdk/src/configuration/sdk_builder.cc 16.67% 60 Missing ⚠️
sdk/include/opentelemetry/sdk/logs/recordable.h 84.73% 11 Missing ⚠️
sdk/src/logs/logger.cc 80.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4132      +/-   ##
==========================================
- Coverage   82.82%   77.73%   -5.09%     
==========================================
  Files         406      433      +27     
  Lines       16913    18201    +1288     
==========================================
+ Hits        14007    14146     +139     
- Misses       2906     4055    +1149     
Files with missing lines Coverage Δ
...emetry/exporters/elasticsearch/es_log_recordable.h 0.00% <ø> (ø)
exporters/elasticsearch/src/es_log_recordable.cc 65.39% <100.00%> (ø)
...opentelemetry/exporters/otlp/otlp_log_recordable.h 100.00% <ø> (ø)
exporters/otlp/src/otlp_log_recordable.cc 36.06% <100.00%> (ø)
exporters/otlp/src/otlp_recordable_utils.cc 98.47% <100.00%> (+0.08%) ⬆️
...lude/opentelemetry/sdk/configuration/sdk_builder.h 100.00% <ø> (ø)
sdk/include/opentelemetry/sdk/logs/logger_config.h 100.00% <100.00%> (ø)
sdk/src/configuration/configuration_parser.cc 77.58% <100.00%> (+0.02%) ⬆️
sdk/src/logs/log_record_limits.cc 100.00% <100.00%> (ø)
sdk/src/logs/logger_config.cc 100.00% <100.00%> (ø)
... and 5 more

... and 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread .github/workflows/clang-tidy.yaml Outdated
warning_limit: 347
- cmake_options: all-options-abiv2-preview
warning_limit: 350
warning_limit: 351

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please resolve these warnings but not just ignore them?
And could you please also implement maximum allowed log record attribute count in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#logrecord-limits ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of SetAttribute. SetAttribute itself is noexcept. problem is It allocate memory. so bad_alloc can happen. current otel doesn't standardize how to handle it. Is ok to swallow it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @owent - we should not just increase the warning limit. SetAttribute() is noexcept, so if an allocation throws here, it would terminate the process.
If allocation fails, I think it is better to drop this attribute/value than crash. I would also avoid logging from this catch block, because the internal log path formats strings and may allocate again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped attribute: 9ce93b3

And also adding count limit in this PR too: 81746d5

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When i read spec, duplication of attribute may allowed. Do i have to include the feature in this PR?

Comment thread sdk/include/opentelemetry/sdk/logs/log_record_limits.h Outdated
Comment thread sdk/src/configuration/sdk_builder.cc
return value;
}

return opentelemetry::nostd::string_view(value, limit_);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncation uses a raw byte count (string_view(value.data(), limit_)), which can split a multi-byte UTF-8 sequence and produce invalid UTF-8. OTLP serializes attributes into protobuf string fields, which require valid UTF-8; strict collectors/parsers may reject or warn.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you i didn't considered utf-8 string. When i read spec, string value can be invalid utf-8 string. Who is resposibility to utf-8 validation? sdk or user?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @ThomsonTan means is: when users set their attributes as valid UTF-8 strings, we should not truncate them into invalid UTF-8 strings. For OTLP, invalid UTF-8 strings will be exported as bytes, which changes the field's type and name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0b38823

Thank you, I covered utf8 string

@proost proost changed the title feat: log record attribute value length limit feat: log record attribute value length limit & attribute count limit Jun 14, 2026
@proost proost requested a review from owent June 14, 2026 13:42
@proost proost requested review from ThomsonTan and lalitb June 14, 2026 13:42
// Byte length of the longest prefix of `value` holding at most `limit` UTF-8 code points,
// never splitting a multi-byte sequence. Malformed lead/continuation bytes are counted as
// one byte each (we do not validate or repair user-supplied invalid UTF-8).
static std::size_t Utf8PrefixLength(opentelemetry::nostd::string_view value,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, if the original input is not a valid UTF-8 string (raw bytes set by user), there is no need to truncate it as UTF-8 string.
What do you think about it?

@proost proost Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3d68878

Agree.

But if you mean more stricter check, I think we need third party like simdutf or utfcpp to check valid utf8 string. Do you think we need strict validation?
Why i didn't is spec discourage to include third party library.

try
{
#endif
if (attribute_keys_.size() < limits_.attribute_count_limit)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about putting this limit/count tracking in the base Recordable hot path. With the default count limit, every log record now maintains attribute count state, and every attribute copies the key into a std::string and stores/checks it in an unordered_set before the exporter recordable gets it.

Since this is in Recordable::SetAttribute(), all exporters pay this cost, not only ReadWriteLogRecord or OTLP. I am okay with adding this where it is needed, but putting it in the base Recordable means every exporter pays extra heap allocation and key-tracking cost even if that exporter already has its own storage model or can enforce the limit more cheaply.

Can we move this out of the base Recordable path and let each recordable/exporter enforce the limit in the way that fits its representation?

@proost proost Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review.
I designed like this due to enforcement.
I agree your concern. I redesign spec-compatible and more performance aware way And i request review again, when i finish.

@lalitb lalitb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned here.

@proost

proost commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@thc1006 will continue the feature in here:
#4157

Thanks for all the feedbacks!

@proost proost closed this Jun 15, 2026
@proost proost deleted the feat-logrecord-attr-value-length-limit branch June 15, 2026 10:49
thc1006 added a commit to thc1006/opentelemetry-cpp that referenced this pull request Jun 18, 2026
…runcate)

Address three review comments from @lalitb on PR open-telemetry#4157:

1. Store LogRecordLimits by value inside ReadWriteLogRecord and
   OtlpLogRecordable instead of a raw pointer to a LoggerContext-owned
   object. A LogRecord is handed out as a unique_ptr, so a record
   outliving the context that produced it would otherwise dereference a
   dangling pointer when the user calls SetAttribute() later. The
   default-constructed value already carries the spec defaults
   (count=128, length=unlimited), so a fresh recordable enforces the
   spec count cap from construction, matching the PR's "enforcement"
   contract.

2. Drop the now-redundant `limits_ != nullptr` short-circuit at every
   enforcement site (4 in total). This also closes the Codecov-reported
   uncovered branch in otlp_log_recordable.cc.

3. Truncate UTF-8 string attributes at a code-point boundary instead of
   a raw byte boundary, so an OTLP protobuf string_value produced by
   truncation stays valid UTF-8 when the input was. Malformed UTF-8 and
   trailing lead bytes degrade to plain byte truncation. Logic adapted
   from open-telemetry#4132 with attribution.

While in the same truncation paths, also apply the byte-length cap to
raw bytes attributes (`vector<uint8_t>` on the SDK side, AnyValue
`bytes_value` on the OTLP side). Both were previously passing through
any size, even though the spec applies `attribute_value_length_limit`
to bytes attributes as well.

Test changes:
- Rename DefaultsPassThroughWithoutLimitsObject to
  DefaultRecordEnforcesSpecCountCap (200 attrs in, 128 stored, 72
  dropped) to reflect the new spec-correct default behavior.
- Add 3 UTF-8 regression tests on the SDK side (split prevention,
  exact fit at sequence boundary, malformed input falls back) plus a
  bytes-truncation test.
- Add 3 mirror tests on the OTLP side, a default-cap test, and a
  bytes-truncation test.

Refs: open-telemetry#4126

Co-authored-by: Hyeonho Kim <proost@apache.org>
Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
thc1006 added a commit to thc1006/opentelemetry-cpp that referenced this pull request Jun 18, 2026
…are truncate)

Address three review comments from @lalitb on PR open-telemetry#4157, plus a
follow-up from @owent (r3434178971).

1. Store LogRecordLimits by value inside ReadWriteLogRecord and
   OtlpLogRecordable instead of a raw pointer to a LoggerContext-owned
   object. A LogRecord is handed out as a unique_ptr, so a record
   outliving the context that produced it would otherwise dereference a
   dangling pointer when the user calls SetAttribute() later. The
   default-constructed value already carries the spec defaults
   (count=128, length=unlimited), so a fresh recordable enforces the
   spec count cap from construction, matching the PR's "enforcement"
   contract.

2. Drop the now-redundant `limits_ != nullptr` short-circuit at every
   enforcement site (4 in total). This also closes the Codecov-reported
   uncovered branch in otlp_log_recordable.cc.

3. Truncate OTLP string attributes at a UTF-8 code-point boundary
   instead of a raw byte boundary, so the protobuf string_value produced
   by truncation stays valid UTF-8 when the input was. Malformed UTF-8
   and trailing lead bytes degrade to plain byte truncation. Logic
   adapted from open-telemetry#4132 with attribution.

   The SDK-side ReadWriteLogRecord truncation stays as plain byte cut.
   The in-memory `OwnedAttributeValue::std::string` variant may
   legitimately carry raw bytes when constructed from a non-UTF-8
   source, so forcing UTF-8 boundary semantics there would over-truncate
   that case (per @owent's r3434178971, echoing the same point on open-telemetry#4132
   r3409677314). Each recordable's truncation strategy now matches its
   own consumer's wire-format requirement: SDK in-memory has no wire
   requirement, OTLP protobuf requires valid UTF-8.

While in the same truncation paths, also apply the byte-length cap to
raw bytes attributes (`vector<uint8_t>` on the SDK side, AnyValue
`bytes_value` on the OTLP side). Both were previously passing through
any size, even though the spec applies `attribute_value_length_limit`
to bytes attributes as well.

Test changes:
- Rename DefaultsPassThroughWithoutLimitsObject to
  DefaultRecordEnforcesSpecCountCap (200 attrs in, 128 stored, 72
  dropped) to reflect the new spec-correct default behavior.
- Add bytes-truncation tests on both SDK and OTLP sides.
- Add 3 UTF-8 regression tests on the OTLP side only (split prevention,
  exact fit at sequence boundary; malformed-fallback omitted since the
  algorithm's seq=1 fallback for invalid continuations is implementation
  detail rather than wire contract).
- Add a default-cap test on the OTLP side.

Refs: open-telemetry#4126

Co-authored-by: Hyeonho Kim <proost@apache.org>
Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement LogRecord limits

4 participants