Skip to content

feat(chore): declarative configuration for trace_provider#2161

Draft
xuan-cao-swi wants to merge 4 commits into
open-telemetry:mainfrom
xuan-cao-swi:otelconfig
Draft

feat(chore): declarative configuration for trace_provider#2161
xuan-cao-swi wants to merge 4 commits into
open-telemetry:mainfrom
xuan-cao-swi:otelconfig

Conversation

@xuan-cao-swi
Copy link
Copy Markdown
Contributor

Description

This PR introduces an initial declarative configuration implementation for OpenTelemetry Ruby under otelconfig, intentionally scoped to tracing only.

I followed the Go otelconfig convention for config structure and behavior where it maps well to Ruby.

Minimalist approach: no hard dependency on optional components, which optional propagators and resource detectors are resolved only if the corresponding gem has been required by the user. If an optional component is not available, configuration continues safely with warnings instead of failing startup.

Scope in this PR

  • Trace configuration only:
    • tracer provider setup
    • samplers
    • span processors
    • trace exporters
    • span limits
  • Declarative resource and propagation wiring needed for trace setup
  • Instrumentation mapping/installation flow for declarative config
  • Tests and example configuration for the implemented behavior

Try /example

Copy link
Copy Markdown
Contributor

@thompson-tomo thompson-tomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing, how are we ensuring the loading of the the env variables are disabled when covered by declarative config?

Will the current approach support dynamic reloading of the config which is also a part of the spec?

Comment thread otelconfig/Rakefile
Comment on lines +20 to +22
YARD::Rake::YardocTask.new do |t|
t.stats_options = ['--list-undoc']
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
YARD::Rake::YardocTask.new do |t|
t.stats_options = ['--list-undoc']
end
YARD::Rake::YardocTask.new

Comment on lines +69 to +89
def build_span_exporter(exp_cfg)
raise ArgumentError, 'no exporter config' unless exp_cfg

configured = 0
exporter = nil

if exp_cfg.key?('console')
configured += 1
exporter = OpenTelemetry::SDK::Trace::Export::ConsoleSpanExporter.new
end

if exp_cfg['otlp_http']
configured += 1
exporter = build_otlp_http_span_exporter(exp_cfg['otlp_http'])
end

raise ArgumentError, 'must not specify multiple exporters' if configured > 1
raise ArgumentError, 'no valid span exporter' if exporter.nil?

exporter
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this scale as exporter is an extension point meaning additional exporters can be added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. Let's keep this simple and comply with what the current Ruby trace SDK can do.

Comment on lines +37 to +47
def build_span_processor(proc_cfg)
raise ArgumentError, 'must not specify multiple span processor type' if proc_cfg['batch'] && proc_cfg['simple']

if proc_cfg['batch']
build_batch_span_processor(proc_cfg['batch'])
elsif proc_cfg['simple']
build_simple_span_processor(proc_cfg['simple'])
else
raise ArgumentError, 'unsupported span processor type, must be one of simple or batch'
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this as scale to support additional processors the as argument error doesn't match spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to include a link or reference to the spec so people can check it quickly. If you are referring to SDK extension components, then it's an SDK-level feature. The Ruby trace SDK doesn't have this feature.

To answer your questions about custom components:

  1. It's an SDK-level feature (e.g., ComponentProvider.java, PHP ComponentProvider). It also requires language-level support for the Service Provider Interface, which Ruby (and Go) do not provide.

Given the inherent differences across languages, the details of extension
component mechanisms are likely to vary to a greater degree than is the case
with other APIs defined by OpenTelemetry. This is to be expected and is
acceptable so long as the implementation results in the defined behaviors.

  1. Even if someone found a way to support it, to keep this PR minimal (already 27 files), it's better to keep it simple.

Copy link
Copy Markdown
Contributor

@thompson-tomo thompson-tomo May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all for keeping it simple, small and manageable while also ensuring that we are doing something which scales and will serve us long term.

For instance could we not load declarative config into the global options after parsing/validating the file, inject an options node into the providers which uses them accordingly. The process of inject & then using them is repeated.

In effect each component is responsible for getting their own options, hence it avoids all of the switch options and decouples config from implementations. It also means we can follow the same path regardless of how it is configured ie env or file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Could we not load declarative config into the global options after parsing/validating the file, inject an options node into the providers which uses them accordingly."

I think what you're describing is something like:

# 1. Parse into global store
GlobalOptions.config = parse_config_file(path)

# 2. Each provider pulls its own slice
TracerProvider.configure(GlobalOptions.config['tracer_provider'])
MeterProvider.configure(GlobalOptions.config['meter_provider'])

This isn't actually less coupling — it just moves it. Each provider still needs to know its own key name and schema. That knowledge gets scattered across multiple files instead of being in one place (apply). If the YAML schema changes (e.g. a key is renamed), you'd have to track down every component that reads it rather than fixing one place.

Java actually tried a global config store and then deleted it. The comment in AutoConfiguredOpenTelemetrySdk.java reflects the resulting separation:

"Returns the ConfigProperties used for auto-configuration, or null if declarative configuration was used."

It also contradicts the Configuration SDK spec, which defines exactly two operations — Parse (file → in-memory model) and Create (model → all SDK components at once) — with Create explicitly centralized:

OpenTelemetry openTelemetry = OpenTelemetry.noop();
try {
    // Parse configuration file to configuration model
    OpenTelemetryConfiguration configurationModel = parse(new File("/app/sdk-config.yaml"));
    // Create SDK components from configuration model
    openTelemetry = create(configurationModel);
} catch (Throwable e) {
    log.error("Error initializing SDK from configuration file", e);
}

// Access SDK components and install instrumentation
TracerProvider tracerProvider = openTelemetry.getTracerProvider();
MeterProvider meterProvider = openTelemetry.getMeterProvider();
LoggerProvider loggerProvider = openTelemetry.getLogsBridge();
ContextPropagators propagators = openTelemetry.getPropagators();
ConfigProvider configProvider = openTelemetry.getConfigProvider();

Both Java and Go follow the same centralized parse → create pattern for this reason.

"In effect each component is responsible for getting their own options, hence it avoids all of the switch options and decouples config from implementations. It also means we can follow the same path regardless of how it is configured ie env or file."

There are no switch statements in the current apply — it's sequential method calls. The branching logic lives inside each component's builder (e.g. Trace.build_tracer_provider choosing which exporter to create). That logic has to live somewhere; moving it doesn't remove it.

Decouples config from implementations

It inverts the coupling rather than removing it. Currently OtelConfig is the single place that knows about component names and config keys. Under the proposed pattern, that knowledge spreads into every component. Centralised is easier to maintain.

Follow the same path regardless of env or file

The spec explicitly rejects this. From the env var spec:

"When OTEL_CONFIG_FILE is set, all other environment variables MUST be ignored … there is no intuitive way to merge the flat environment variable scheme with the structured file configuration scheme."

The two paths are intentionally separate. Env vars are flat strings; file config is a typed, structured, nested object. A provider trying to read from "wherever config lives" would need to handle both formats — that's more complexity inside the component, not less.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed reply I however am still very worried that the current approach will not scale and makes it harder for us to be fully compliant with the spec without refactoring the newly implemented functionality.

I have started sketching my thoughts with an emphasis on spec & progressively adding declarative config. See thompson-tomo#44 as a starting point.

Key thing is the configure method calls parse with the intention of this returning a typed in-memory configuration model as per the spec.

The configuration model is then passed to the create call as per the spec. Within the create, the providers are then created using the scoped config section see propagators as an example. If no config then load environment variables. The create method then returns the created providers.

Further looking at https://opentelemetry.io/docs/specs/otel/configuration/sdk/#register-plugincomponentprovider the providers should then create components ie processors passing in the config https://opentelemetry.io/docs/specs/otel/configuration/sdk/#create-component

Comment on lines +104 to +122
def build_sampler(sampler_cfg)
s = OpenTelemetry::SDK::Trace::Samplers

# Default: parent-based with always_on root
return s.parent_based(root: s::ALWAYS_ON) unless sampler_cfg

if sampler_cfg['parent_based']
build_parent_based_sampler(sampler_cfg['parent_based'])
elsif sampler_cfg.key?('always_on')
s::ALWAYS_ON
elsif sampler_cfg.key?('always_off')
s::ALWAYS_OFF
elsif sampler_cfg['trace_id_ratio_based']
ratio = sampler_cfg['trace_id_ratio_based']['ratio'] || 1.0
s.trace_id_ratio_based(ratio.to_f)
else
s.parent_based(root: s::ALWAYS_ON)
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this scale to support custom/additional samplers?

return {} unless instrumentation_cfg.is_a?(Hash)

general = instrumentation_cfg['general']
return {} unless general.is_a?(Hash)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be returning the default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenTelemetry::Instrumentation.registry.install_all({}) is the default

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that installs all instrumentation, I am more talking about the default options as defined in declarative config aka within https://github.com/open-telemetry/opentelemetry-configuration/blob/main/schema-docs.md#experimentalgeneralinstrumentation-.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't worry about it until the contrib semantic conventions is stable

Comment on lines +1 to +180
# frozen_string_literal: true

# Copyright The OpenTelemetry Authors
# SPDX-License-Identifier: Apache-2.0

module OpenTelemetry
module OtelConfig
# Trace component builder for configuring TracerProvider from declarative config.
module Trace
module_function

# Builds a TracerProvider from the parsed YAML tracer_provider config.
# Returns a noop-configured provider if config is nil.
def build_tracer_provider(config, resource)
return OpenTelemetry::Trace::TracerProvider.new unless config

sampler = build_sampler(config['sampler'])
span_limits = build_span_limits(config['limits'])

tp = OpenTelemetry::SDK::Trace::TracerProvider.new(
resource: resource,
sampler: sampler,
span_limits: span_limits
)

Array(config['processors']).each do |proc_cfg|
processor = build_span_processor(proc_cfg)
tp.add_span_processor(processor) if processor
rescue StandardError => e
OpenTelemetry.logger.warn("Failed to build span processor: #{e.message}")
end

tp
end

# Builds a span processor (simple or batch) from config hash.
def build_span_processor(proc_cfg)
raise ArgumentError, 'must not specify multiple span processor type' if proc_cfg['batch'] && proc_cfg['simple']

if proc_cfg['batch']
build_batch_span_processor(proc_cfg['batch'])
elsif proc_cfg['simple']
build_simple_span_processor(proc_cfg['simple'])
else
raise ArgumentError, 'unsupported span processor type, must be one of simple or batch'
end
end

# Builds a BatchSpanProcessor with exporter and optional tuning options.
def build_batch_span_processor(cfg)
exporter = build_span_exporter(cfg['exporter'])
opts = {
schedule_delay: cfg['schedule_delay']&.to_f,
exporter_timeout: cfg['export_timeout']&.to_f,
max_queue_size: cfg['max_queue_size']&.to_i,
max_export_batch_size: cfg['max_export_batch_size']&.to_i
}.compact

OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(exporter, **opts)
end

# Builds a SimpleSpanProcessor wrapping the configured exporter.
def build_simple_span_processor(cfg)
exporter = build_span_exporter(cfg['exporter'])
OpenTelemetry::SDK::Trace::Export::SimpleSpanProcessor.new(exporter)
end

# Builds a span exporter from config; supports console and otlp_http.
def build_span_exporter(exp_cfg)
raise ArgumentError, 'no exporter config' unless exp_cfg

configured = 0
exporter = nil

if exp_cfg.key?('console')
configured += 1
exporter = OpenTelemetry::SDK::Trace::Export::ConsoleSpanExporter.new
end

if exp_cfg['otlp_http']
configured += 1
exporter = build_otlp_http_span_exporter(exp_cfg['otlp_http'])
end

raise ArgumentError, 'must not specify multiple exporters' if configured > 1
raise ArgumentError, 'no valid span exporter' if exporter.nil?

exporter
end

# Builds an OTLP HTTP span exporter from the given endpoint/headers config.
def build_otlp_http_span_exporter(cfg)
opts = {
endpoint: cfg['endpoint'],
headers: cfg['headers'] || cfg['headers_list'] ? parse_headers(cfg) : nil,
compression: cfg['compression'],
timeout: cfg['timeout'] && cfg['timeout'] / 1000.0 # YAML ms → Ruby seconds
}.compact

OpenTelemetry::Exporter::OTLP::Exporter.new(**opts)
end

# Builds a sampler from config; defaults to ParentBased(ALWAYS_ON).
def build_sampler(sampler_cfg)
s = OpenTelemetry::SDK::Trace::Samplers

# Default: parent-based with always_on root
return s.parent_based(root: s::ALWAYS_ON) unless sampler_cfg

if sampler_cfg['parent_based']
build_parent_based_sampler(sampler_cfg['parent_based'])
elsif sampler_cfg.key?('always_on')
s::ALWAYS_ON
elsif sampler_cfg.key?('always_off')
s::ALWAYS_OFF
elsif sampler_cfg['trace_id_ratio_based']
ratio = sampler_cfg['trace_id_ratio_based']['ratio'] || 1.0
s.trace_id_ratio_based(ratio.to_f)
else
s.parent_based(root: s::ALWAYS_ON)
end
end

# Builds a ParentBased sampler with configurable root and remote/local delegates.
def build_parent_based_sampler(cfg)
s = OpenTelemetry::SDK::Trace::Samplers

root = cfg['root'] ? build_sampler(cfg['root']) : s::ALWAYS_ON

opts = {
root: root,
remote_parent_sampled: cfg['remote_parent_sampled'] && build_sampler(cfg['remote_parent_sampled']),
remote_parent_not_sampled: cfg['remote_parent_not_sampled'] && build_sampler(cfg['remote_parent_not_sampled']),
local_parent_sampled: cfg['local_parent_sampled'] && build_sampler(cfg['local_parent_sampled']),
local_parent_not_sampled: cfg['local_parent_not_sampled'] && build_sampler(cfg['local_parent_not_sampled'])
}.compact

s.parent_based(**opts)
end

# Builds SpanLimits from config; returns the SDK default when config is nil.
def build_span_limits(limits_cfg)
return OpenTelemetry::SDK::Trace::SpanLimits::DEFAULT unless limits_cfg

opts = {
attribute_count_limit: limits_cfg['attribute_count_limit'],
attribute_length_limit: limits_cfg['attribute_value_length_limit'],
event_count_limit: limits_cfg['event_count_limit'],
link_count_limit: limits_cfg['link_count_limit'],
event_attribute_count_limit: limits_cfg['event_attribute_count_limit'],
link_attribute_count_limit: limits_cfg['link_attribute_count_limit']
}.compact

OpenTelemetry::SDK::Trace::SpanLimits.new(**opts)
end

# Parses headers from YAML array format or headers_list string.
# Array format takes precedence over headers_list.
def parse_headers(cfg)
headers = {}

if cfg['headers'].is_a?(Array)
cfg['headers'].each do |h|
headers[h['name']] = h['value'] if h['name'] && h['value']
end
end

# Fall back to headers_list only if headers array produced nothing
if headers.empty? && cfg['headers_list'].is_a?(String)
cfg['headers_list'].split(',').each do |pair|
key, value = pair.strip.split('=', 2)
headers[key] = value if key && value
end
end

headers
end
end
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be missing the tracerConfig.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is tracerConfig? Could you be more specific?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe focus on stable first? since Maintainers are not obligated to implement support for experimental properties and types.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine but not in other spots we have implemented experimental.

Comment on lines +86 to +101
# Returns a Resource for the given detector name.
def run_detector(name)
case name
when 'container'
detect_resource('OpenTelemetry::Resource::Detector::Container')
when 'aws'
# Run all AWS sub-detectors; each returns an empty resource if not on that platform.
detect_resource('OpenTelemetry::Resource::Detector::AWS', %i[ec2 ecs eks lambda])
when 'azure'
detect_resource('OpenTelemetry::Resource::Detector::Azure')
when 'google_cloud_platform'
detect_resource('OpenTelemetry::Resource::Detector::GoogleCloudPlatform')
else
OpenTelemetry.logger.warn("OtelConfig: unknown resource detector '#{name}'; skipping.")
OpenTelemetry::SDK::Resources::Resource.create({})
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this scale as mentioned in the spec.

Comment on lines +27 to +38
spec.add_development_dependency 'opentelemetry-api', '~> 1.10.0'
spec.add_development_dependency 'opentelemetry-common', '~> 0.25.0'
spec.add_development_dependency 'opentelemetry-exporter-otlp', '~> 0.34.0'
spec.add_development_dependency 'opentelemetry-instrumentation-all', '~> 0.91.0'
spec.add_development_dependency 'opentelemetry-propagator-google_cloud_trace_context', '~> 0.4.0'
spec.add_development_dependency 'opentelemetry-propagator-ottrace', '~> 0.25.0'
spec.add_development_dependency 'opentelemetry-propagator-xray', '~> 0.27.0'
spec.add_development_dependency 'opentelemetry-resource-detector-aws', '~> 0.5.0'
spec.add_development_dependency 'opentelemetry-resource-detector-azure', '~> 0.3.0'
spec.add_development_dependency 'opentelemetry-resource-detector-container', '~> 0.3.0'
spec.add_development_dependency 'opentelemetry-resource-detector-google_cloud_platform', '~> 0.4.0'
spec.add_development_dependency 'opentelemetry-sdk', '~> 1.12'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to gemfile.

spec.add_development_dependency 'opentelemetry-sdk', '~> 1.12'

if spec.respond_to?(:metadata)
spec.metadata['changelog_uri'] = "https://open-telemetry.github.io/opentelemetry-ruby/opentelemetry-logs-sdk/v#{OpenTelemetry::OtelConfig::VERSION}/file.CHANGELOG.html"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update as done in #2123

spec.metadata['changelog_uri'] = "https://open-telemetry.github.io/opentelemetry-ruby/opentelemetry-logs-sdk/v#{OpenTelemetry::OtelConfig::VERSION}/file.CHANGELOG.html"
spec.metadata['source_code_uri'] = "https://github.com/open-telemetry/opentelemetry-ruby/tree/#{spec.name}/v#{spec.version}/logs_sdk"
spec.metadata['bug_tracker_uri'] = 'https://github.com/open-telemetry/opentelemetry-ruby/issues'
spec.metadata['documentation_uri'] =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update as done in #2123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants