Skip to content

Implement OTLP metrics bridge#6465

Open
shuheiktgw wants to merge 2 commits into
quickwit-oss:mainfrom
shuheiktgw:codex/inline-otlp-exporter
Open

Implement OTLP metrics bridge#6465
shuheiktgw wants to merge 2 commits into
quickwit-oss:mainfrom
shuheiktgw:codex/inline-otlp-exporter

Conversation

@shuheiktgw
Copy link
Copy Markdown
Collaborator

@shuheiktgw shuheiktgw commented May 24, 2026

Summary

In the previous PR, I used metrics-exporter-otel to bridge the OTel exporter and metrics-rs. However, the library has a few issues, described below, so I would like to implement the bridge ourselves.

Problems (human-written)

1. Delta Temporality for Counters Is Incorrectly Exported as Cumulative

Counter metrics exported by metrics-exporter-otel can be incorrect when using delta temporality: they appear to be exported as cumulative values instead of delta values. This is not necessarily a bug in metrics-exporter-otel itself, but rather appears to be an issue in the OTel SDK.

metrics-exporter-otel uses ObservableCounter, which registers a callback that is invoked whenever metrics are exported. However, there seems to be a bug in the OTel SDK when using ObservableCounter with delta temporality.

The issue occurs when metrics are recorded, or exported in the case of ObservableCounter. The ValueMap#measure method is called, and the same tracker is inserted into the trackers hashmap twice: once using the original attributes as the key, and once using the sorted attributes as the key.

// Insert tracker with the attributes in the provided and sorted orders
trackers.insert(attributes.to_vec(), new_tracker.clone());
trackers.insert(sorted_attrs, new_tracker);

For example, consider the quickwit_indexing_processed_bytes metric. Its labels look like this:

let labels = labels!(
    "index" => ...,
    "docs_processed_status" => outcome
);

The original attribute order is:

[index, docs_processed_status]

Since docs_processed_status comes before index alphabetically, the sorted order is:

[docs_processed_status, index]

As a result, the trackers hashmap ends up looking like this:

[index, docs_processed_status] -> tracker X
[docs_processed_status, index] -> tracker X

This becomes problematic when computing deltas for ObservableCounter.

Note that the OTel SDK has to compute the delta itself by remembering previously exported values, because ObservableCounter only provides the current cumulative value. Here, it computes the delta by subtracting the previously reported value from the current value:

self.value_map
    .collect_and_reset(&mut s_data.data_points, |attributes, aggr| {
        let value = aggr.value.get_value();
        new_reported.insert(attributes.clone(), value);
        let delta = value - *reported.get(&attributes).unwrap_or(&T::default());
        SumDataPoint {
            attributes,
            value: delta,
            exemplars: vec![],
        }
    });

However, the problem occurs in collect_and_reset. It drains all entries from trackers, then uses the attributes from the first occurrence of a given tracker as the key to look up the previously exported value. This can be incorrect because hashmap drain order is not guaranteed.

for (attrs, tracker) in trackers_collect.drain() {
    if seen.insert(Arc::as_ptr(&tracker)) {
        dest.push(map_fn(attrs, tracker.clone_and_reset(&self.config)));
    }
}

In the example above, we do not know whether the attributes will be returned as:

[index, docs_processed_status]

or:

[docs_processed_status, index]

What makes this worse is that collect_and_reset swaps the hashmaps between self.trackers_for_collect and self.trackers every time it is called. I believe this is what causes these two attributes to be used alternately, although I am not completely sure about this part and would appreciate additional input.

As a result, the previously reported value may not be found, so it falls back to 0. The computed delta therefore becomes the full cumulative value, rather than the actual delta.

This actually explains why the same problem does not occur with other counter metrics, such as quickwit_ingest_docs_bytes_total. It only has one attribute, validity, so the attribute order does not change and the SDK always receives the same attribute set.

2. Supply Chain Risk

I used metrics-exporter-otel, but it doesn’t seem to be very widely used, so I’m concerned that relying on it could introduce supply chain risk. There is another similar crate, but that also doesn’t seem to be widely adopted.

Proposed Solution

Implement the bridge ourselves. This new bridge simply uses self.counter.add instead of ObservableCounter, so it does not have the same issue described in Problem 1. Since the SDK does not need to store previously exported values internally, it can just export the current value and reset it each time it exports metrics. I have already tested this and confirmed that the delta works correctly.

An alternative would be to use another library such as DoumanAsh/metrics-opentelemetry, which uses the add method. However, it does not solve the second problem. (+ DoumanAsh/metrics-opentelemetry misses some interfaces like configuring bucket size so we have to update it)

@guilload
Copy link
Copy Markdown
Member

What's the rationale for doing this?

@shuheiktgw
Copy link
Copy Markdown
Collaborator Author

@guilload Right now, metrics-rs doesn’t support OTLP exporters by default. There is an open PR for it, but I’m not sure when it will be merged.

In the previous PR, I used metrics-exporter-otel instead, but it doesn’t seem to be very widely used, so I’m concerned that relying on it could introduce supply chain risk. There is another similar crate, but that also doesn’t seem to be widely adopted.

The exporter implementation is relatively simple, so I was thinking it might be better for us to just write it ourselves. What do you think?

@julian45
Copy link
Copy Markdown

julian45 commented Jun 3, 2026

Apologies for butting in as a (relative) outsider, but why not use opentelemetry-otlp for metrics export instead? It's already in Cargo.toml, is maintained under official OTel auspices, and already supports metrics export (with a quick demo on its docs.rs page and more examples here).

@shuheiktgw shuheiktgw force-pushed the codex/inline-otlp-exporter branch from fe128d0 to 3e3ccdd Compare June 4, 2026 11:26
@guilload
Copy link
Copy Markdown
Member

guilload commented Jun 4, 2026

because we need a StatsD exporter for our fork.

@shuheiktgw shuheiktgw changed the title Inline OTLP metrics exporter Implement OTLP metrics bridge Jun 5, 2026
@shuheiktgw shuheiktgw marked this pull request as ready for review June 5, 2026 08:11
@shuheiktgw shuheiktgw requested review from a team as code owners June 5, 2026 08:11
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e3ccdd130

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1 to +3
// Copyright 2021-Present Datadog, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retain the vendored MIT attribution

When distributing builds or source from this commit, the in-tree OTLP recorder/storage code appears to vendor substantial portions of metrics-exporter-otel (same recorder, metadata, storage, and instrument structure) while the third-party license entry was removed and these new files carry only the Datadog Apache header. The upstream crate is MIT-licensed, which requires preserving its copyright and permission notice in copies/substantial portions, so please retain the upstream attribution or keep it listed as vendored third-party code.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shuheiktgw let's just fork the repo under quickwit-oss, it's cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants