feat(chore): declarative configuration for trace_provider#2161
feat(chore): declarative configuration for trace_provider#2161xuan-cao-swi wants to merge 4 commits into
Conversation
thompson-tomo
left a comment
There was a problem hiding this comment.
One other thing, how are we ensuring the loading of the the env variables are disabled when covered by declarative config?
Will the current approach support dynamic reloading of the config which is also a part of the spec?
| YARD::Rake::YardocTask.new do |t| | ||
| t.stats_options = ['--list-undoc'] | ||
| end |
There was a problem hiding this comment.
| YARD::Rake::YardocTask.new do |t| | |
| t.stats_options = ['--list-undoc'] | |
| end | |
| YARD::Rake::YardocTask.new |
| def build_span_exporter(exp_cfg) | ||
| raise ArgumentError, 'no exporter config' unless exp_cfg | ||
|
|
||
| configured = 0 | ||
| exporter = nil | ||
|
|
||
| if exp_cfg.key?('console') | ||
| configured += 1 | ||
| exporter = OpenTelemetry::SDK::Trace::Export::ConsoleSpanExporter.new | ||
| end | ||
|
|
||
| if exp_cfg['otlp_http'] | ||
| configured += 1 | ||
| exporter = build_otlp_http_span_exporter(exp_cfg['otlp_http']) | ||
| end | ||
|
|
||
| raise ArgumentError, 'must not specify multiple exporters' if configured > 1 | ||
| raise ArgumentError, 'no valid span exporter' if exporter.nil? | ||
|
|
||
| exporter | ||
| end |
There was a problem hiding this comment.
How will this scale as exporter is an extension point meaning additional exporters can be added?
There was a problem hiding this comment.
It doesn't. Let's keep this simple and comply with what the current Ruby trace SDK can do.
| def build_span_processor(proc_cfg) | ||
| raise ArgumentError, 'must not specify multiple span processor type' if proc_cfg['batch'] && proc_cfg['simple'] | ||
|
|
||
| if proc_cfg['batch'] | ||
| build_batch_span_processor(proc_cfg['batch']) | ||
| elsif proc_cfg['simple'] | ||
| build_simple_span_processor(proc_cfg['simple']) | ||
| else | ||
| raise ArgumentError, 'unsupported span processor type, must be one of simple or batch' | ||
| end | ||
| end |
There was a problem hiding this comment.
How will this as scale to support additional processors the as argument error doesn't match spec.
There was a problem hiding this comment.
It's better to include a link or reference to the spec so people can check it quickly. If you are referring to SDK extension components, then it's an SDK-level feature. The Ruby trace SDK doesn't have this feature.
To answer your questions about custom components:
- It's an SDK-level feature (e.g., ComponentProvider.java, PHP ComponentProvider). It also requires language-level support for the Service Provider Interface, which Ruby (and Go) do not provide.
Given the inherent differences across languages, the details of extension
component mechanisms are likely to vary to a greater degree than is the case
with other APIs defined by OpenTelemetry. This is to be expected and is
acceptable so long as the implementation results in the defined behaviors.
- Even if someone found a way to support it, to keep this PR minimal (already 27 files), it's better to keep it simple.
There was a problem hiding this comment.
I am all for keeping it simple, small and manageable while also ensuring that we are doing something which scales and will serve us long term.
For instance could we not load declarative config into the global options after parsing/validating the file, inject an options node into the providers which uses them accordingly. The process of inject & then using them is repeated.
In effect each component is responsible for getting their own options, hence it avoids all of the switch options and decouples config from implementations. It also means we can follow the same path regardless of how it is configured ie env or file.
There was a problem hiding this comment.
"Could we not load declarative config into the global options after parsing/validating the file, inject an options node into the providers which uses them accordingly."
I think what you're describing is something like:
# 1. Parse into global store
GlobalOptions.config = parse_config_file(path)
# 2. Each provider pulls its own slice
TracerProvider.configure(GlobalOptions.config['tracer_provider'])
MeterProvider.configure(GlobalOptions.config['meter_provider'])This isn't actually less coupling — it just moves it. Each provider still needs to know its own key name and schema. That knowledge gets scattered across multiple files instead of being in one place (apply). If the YAML schema changes (e.g. a key is renamed), you'd have to track down every component that reads it rather than fixing one place.
Java actually tried a global config store and then deleted it. The comment in AutoConfiguredOpenTelemetrySdk.java reflects the resulting separation:
"Returns the
ConfigPropertiesused for auto-configuration, ornullif declarative configuration was used."
It also contradicts the Configuration SDK spec, which defines exactly two operations — Parse (file → in-memory model) and Create (model → all SDK components at once) — with Create explicitly centralized:
OpenTelemetry openTelemetry = OpenTelemetry.noop();
try {
// Parse configuration file to configuration model
OpenTelemetryConfiguration configurationModel = parse(new File("/app/sdk-config.yaml"));
// Create SDK components from configuration model
openTelemetry = create(configurationModel);
} catch (Throwable e) {
log.error("Error initializing SDK from configuration file", e);
}
// Access SDK components and install instrumentation
TracerProvider tracerProvider = openTelemetry.getTracerProvider();
MeterProvider meterProvider = openTelemetry.getMeterProvider();
LoggerProvider loggerProvider = openTelemetry.getLogsBridge();
ContextPropagators propagators = openTelemetry.getPropagators();
ConfigProvider configProvider = openTelemetry.getConfigProvider();
Both Java and Go follow the same centralized parse → create pattern for this reason.
"In effect each component is responsible for getting their own options, hence it avoids all of the switch options and decouples config from implementations. It also means we can follow the same path regardless of how it is configured ie env or file."
There are no switch statements in the current apply — it's sequential method calls. The branching logic lives inside each component's builder (e.g. Trace.build_tracer_provider choosing which exporter to create). That logic has to live somewhere; moving it doesn't remove it.
Decouples config from implementations
It inverts the coupling rather than removing it. Currently OtelConfig is the single place that knows about component names and config keys. Under the proposed pattern, that knowledge spreads into every component. Centralised is easier to maintain.
Follow the same path regardless of env or file
The spec explicitly rejects this. From the env var spec:
"When
OTEL_CONFIG_FILEis set, all other environment variables MUST be ignored … there is no intuitive way to merge the flat environment variable scheme with the structured file configuration scheme."
The two paths are intentionally separate. Env vars are flat strings; file config is a typed, structured, nested object. A provider trying to read from "wherever config lives" would need to handle both formats — that's more complexity inside the component, not less.
There was a problem hiding this comment.
Thanks for the detailed reply I however am still very worried that the current approach will not scale and makes it harder for us to be fully compliant with the spec without refactoring the newly implemented functionality.
I have started sketching my thoughts with an emphasis on spec & progressively adding declarative config. See thompson-tomo#44 as a starting point.
Key thing is the configure method calls parse with the intention of this returning a typed in-memory configuration model as per the spec.
The configuration model is then passed to the create call as per the spec. Within the create, the providers are then created using the scoped config section see propagators as an example. If no config then load environment variables. The create method then returns the created providers.
Further looking at https://opentelemetry.io/docs/specs/otel/configuration/sdk/#register-plugincomponentprovider the providers should then create components ie processors passing in the config https://opentelemetry.io/docs/specs/otel/configuration/sdk/#create-component
| def build_sampler(sampler_cfg) | ||
| s = OpenTelemetry::SDK::Trace::Samplers | ||
|
|
||
| # Default: parent-based with always_on root | ||
| return s.parent_based(root: s::ALWAYS_ON) unless sampler_cfg | ||
|
|
||
| if sampler_cfg['parent_based'] | ||
| build_parent_based_sampler(sampler_cfg['parent_based']) | ||
| elsif sampler_cfg.key?('always_on') | ||
| s::ALWAYS_ON | ||
| elsif sampler_cfg.key?('always_off') | ||
| s::ALWAYS_OFF | ||
| elsif sampler_cfg['trace_id_ratio_based'] | ||
| ratio = sampler_cfg['trace_id_ratio_based']['ratio'] || 1.0 | ||
| s.trace_id_ratio_based(ratio.to_f) | ||
| else | ||
| s.parent_based(root: s::ALWAYS_ON) | ||
| end | ||
| end |
There was a problem hiding this comment.
How will this scale to support custom/additional samplers?
| return {} unless instrumentation_cfg.is_a?(Hash) | ||
|
|
||
| general = instrumentation_cfg['general'] | ||
| return {} unless general.is_a?(Hash) |
There was a problem hiding this comment.
We should be returning the default
There was a problem hiding this comment.
OpenTelemetry::Instrumentation.registry.install_all({}) is the default
There was a problem hiding this comment.
Yes that installs all instrumentation, I am more talking about the default options as defined in declarative config aka within https://github.com/open-telemetry/opentelemetry-configuration/blob/main/schema-docs.md#experimentalgeneralinstrumentation-.
There was a problem hiding this comment.
I won't worry about it until the contrib semantic conventions is stable
| # frozen_string_literal: true | ||
|
|
||
| # Copyright The OpenTelemetry Authors | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| module OpenTelemetry | ||
| module OtelConfig | ||
| # Trace component builder for configuring TracerProvider from declarative config. | ||
| module Trace | ||
| module_function | ||
|
|
||
| # Builds a TracerProvider from the parsed YAML tracer_provider config. | ||
| # Returns a noop-configured provider if config is nil. | ||
| def build_tracer_provider(config, resource) | ||
| return OpenTelemetry::Trace::TracerProvider.new unless config | ||
|
|
||
| sampler = build_sampler(config['sampler']) | ||
| span_limits = build_span_limits(config['limits']) | ||
|
|
||
| tp = OpenTelemetry::SDK::Trace::TracerProvider.new( | ||
| resource: resource, | ||
| sampler: sampler, | ||
| span_limits: span_limits | ||
| ) | ||
|
|
||
| Array(config['processors']).each do |proc_cfg| | ||
| processor = build_span_processor(proc_cfg) | ||
| tp.add_span_processor(processor) if processor | ||
| rescue StandardError => e | ||
| OpenTelemetry.logger.warn("Failed to build span processor: #{e.message}") | ||
| end | ||
|
|
||
| tp | ||
| end | ||
|
|
||
| # Builds a span processor (simple or batch) from config hash. | ||
| def build_span_processor(proc_cfg) | ||
| raise ArgumentError, 'must not specify multiple span processor type' if proc_cfg['batch'] && proc_cfg['simple'] | ||
|
|
||
| if proc_cfg['batch'] | ||
| build_batch_span_processor(proc_cfg['batch']) | ||
| elsif proc_cfg['simple'] | ||
| build_simple_span_processor(proc_cfg['simple']) | ||
| else | ||
| raise ArgumentError, 'unsupported span processor type, must be one of simple or batch' | ||
| end | ||
| end | ||
|
|
||
| # Builds a BatchSpanProcessor with exporter and optional tuning options. | ||
| def build_batch_span_processor(cfg) | ||
| exporter = build_span_exporter(cfg['exporter']) | ||
| opts = { | ||
| schedule_delay: cfg['schedule_delay']&.to_f, | ||
| exporter_timeout: cfg['export_timeout']&.to_f, | ||
| max_queue_size: cfg['max_queue_size']&.to_i, | ||
| max_export_batch_size: cfg['max_export_batch_size']&.to_i | ||
| }.compact | ||
|
|
||
| OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(exporter, **opts) | ||
| end | ||
|
|
||
| # Builds a SimpleSpanProcessor wrapping the configured exporter. | ||
| def build_simple_span_processor(cfg) | ||
| exporter = build_span_exporter(cfg['exporter']) | ||
| OpenTelemetry::SDK::Trace::Export::SimpleSpanProcessor.new(exporter) | ||
| end | ||
|
|
||
| # Builds a span exporter from config; supports console and otlp_http. | ||
| def build_span_exporter(exp_cfg) | ||
| raise ArgumentError, 'no exporter config' unless exp_cfg | ||
|
|
||
| configured = 0 | ||
| exporter = nil | ||
|
|
||
| if exp_cfg.key?('console') | ||
| configured += 1 | ||
| exporter = OpenTelemetry::SDK::Trace::Export::ConsoleSpanExporter.new | ||
| end | ||
|
|
||
| if exp_cfg['otlp_http'] | ||
| configured += 1 | ||
| exporter = build_otlp_http_span_exporter(exp_cfg['otlp_http']) | ||
| end | ||
|
|
||
| raise ArgumentError, 'must not specify multiple exporters' if configured > 1 | ||
| raise ArgumentError, 'no valid span exporter' if exporter.nil? | ||
|
|
||
| exporter | ||
| end | ||
|
|
||
| # Builds an OTLP HTTP span exporter from the given endpoint/headers config. | ||
| def build_otlp_http_span_exporter(cfg) | ||
| opts = { | ||
| endpoint: cfg['endpoint'], | ||
| headers: cfg['headers'] || cfg['headers_list'] ? parse_headers(cfg) : nil, | ||
| compression: cfg['compression'], | ||
| timeout: cfg['timeout'] && cfg['timeout'] / 1000.0 # YAML ms → Ruby seconds | ||
| }.compact | ||
|
|
||
| OpenTelemetry::Exporter::OTLP::Exporter.new(**opts) | ||
| end | ||
|
|
||
| # Builds a sampler from config; defaults to ParentBased(ALWAYS_ON). | ||
| def build_sampler(sampler_cfg) | ||
| s = OpenTelemetry::SDK::Trace::Samplers | ||
|
|
||
| # Default: parent-based with always_on root | ||
| return s.parent_based(root: s::ALWAYS_ON) unless sampler_cfg | ||
|
|
||
| if sampler_cfg['parent_based'] | ||
| build_parent_based_sampler(sampler_cfg['parent_based']) | ||
| elsif sampler_cfg.key?('always_on') | ||
| s::ALWAYS_ON | ||
| elsif sampler_cfg.key?('always_off') | ||
| s::ALWAYS_OFF | ||
| elsif sampler_cfg['trace_id_ratio_based'] | ||
| ratio = sampler_cfg['trace_id_ratio_based']['ratio'] || 1.0 | ||
| s.trace_id_ratio_based(ratio.to_f) | ||
| else | ||
| s.parent_based(root: s::ALWAYS_ON) | ||
| end | ||
| end | ||
|
|
||
| # Builds a ParentBased sampler with configurable root and remote/local delegates. | ||
| def build_parent_based_sampler(cfg) | ||
| s = OpenTelemetry::SDK::Trace::Samplers | ||
|
|
||
| root = cfg['root'] ? build_sampler(cfg['root']) : s::ALWAYS_ON | ||
|
|
||
| opts = { | ||
| root: root, | ||
| remote_parent_sampled: cfg['remote_parent_sampled'] && build_sampler(cfg['remote_parent_sampled']), | ||
| remote_parent_not_sampled: cfg['remote_parent_not_sampled'] && build_sampler(cfg['remote_parent_not_sampled']), | ||
| local_parent_sampled: cfg['local_parent_sampled'] && build_sampler(cfg['local_parent_sampled']), | ||
| local_parent_not_sampled: cfg['local_parent_not_sampled'] && build_sampler(cfg['local_parent_not_sampled']) | ||
| }.compact | ||
|
|
||
| s.parent_based(**opts) | ||
| end | ||
|
|
||
| # Builds SpanLimits from config; returns the SDK default when config is nil. | ||
| def build_span_limits(limits_cfg) | ||
| return OpenTelemetry::SDK::Trace::SpanLimits::DEFAULT unless limits_cfg | ||
|
|
||
| opts = { | ||
| attribute_count_limit: limits_cfg['attribute_count_limit'], | ||
| attribute_length_limit: limits_cfg['attribute_value_length_limit'], | ||
| event_count_limit: limits_cfg['event_count_limit'], | ||
| link_count_limit: limits_cfg['link_count_limit'], | ||
| event_attribute_count_limit: limits_cfg['event_attribute_count_limit'], | ||
| link_attribute_count_limit: limits_cfg['link_attribute_count_limit'] | ||
| }.compact | ||
|
|
||
| OpenTelemetry::SDK::Trace::SpanLimits.new(**opts) | ||
| end | ||
|
|
||
| # Parses headers from YAML array format or headers_list string. | ||
| # Array format takes precedence over headers_list. | ||
| def parse_headers(cfg) | ||
| headers = {} | ||
|
|
||
| if cfg['headers'].is_a?(Array) | ||
| cfg['headers'].each do |h| | ||
| headers[h['name']] = h['value'] if h['name'] && h['value'] | ||
| end | ||
| end | ||
|
|
||
| # Fall back to headers_list only if headers array produced nothing | ||
| if headers.empty? && cfg['headers_list'].is_a?(String) | ||
| cfg['headers_list'].split(',').each do |pair| | ||
| key, value = pair.strip.split('=', 2) | ||
| headers[key] = value if key && value | ||
| end | ||
| end | ||
|
|
||
| headers | ||
| end | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
We seem to be missing the tracerConfig.
There was a problem hiding this comment.
What is tracerConfig? Could you be more specific?
There was a problem hiding this comment.
It is one of the types defined at https://github.com/open-telemetry/opentelemetry-configuration/blob/main/schema-docs.md#experimentaltracerconfigurator-
There was a problem hiding this comment.
maybe focus on stable first? since Maintainers are not obligated to implement support for experimental properties and types.
There was a problem hiding this comment.
That is fine but not in other spots we have implemented experimental.
| # Returns a Resource for the given detector name. | ||
| def run_detector(name) | ||
| case name | ||
| when 'container' | ||
| detect_resource('OpenTelemetry::Resource::Detector::Container') | ||
| when 'aws' | ||
| # Run all AWS sub-detectors; each returns an empty resource if not on that platform. | ||
| detect_resource('OpenTelemetry::Resource::Detector::AWS', %i[ec2 ecs eks lambda]) | ||
| when 'azure' | ||
| detect_resource('OpenTelemetry::Resource::Detector::Azure') | ||
| when 'google_cloud_platform' | ||
| detect_resource('OpenTelemetry::Resource::Detector::GoogleCloudPlatform') | ||
| else | ||
| OpenTelemetry.logger.warn("OtelConfig: unknown resource detector '#{name}'; skipping.") | ||
| OpenTelemetry::SDK::Resources::Resource.create({}) | ||
| end |
There was a problem hiding this comment.
How will this scale as mentioned in the spec.
| spec.add_development_dependency 'opentelemetry-api', '~> 1.10.0' | ||
| spec.add_development_dependency 'opentelemetry-common', '~> 0.25.0' | ||
| spec.add_development_dependency 'opentelemetry-exporter-otlp', '~> 0.34.0' | ||
| spec.add_development_dependency 'opentelemetry-instrumentation-all', '~> 0.91.0' | ||
| spec.add_development_dependency 'opentelemetry-propagator-google_cloud_trace_context', '~> 0.4.0' | ||
| spec.add_development_dependency 'opentelemetry-propagator-ottrace', '~> 0.25.0' | ||
| spec.add_development_dependency 'opentelemetry-propagator-xray', '~> 0.27.0' | ||
| spec.add_development_dependency 'opentelemetry-resource-detector-aws', '~> 0.5.0' | ||
| spec.add_development_dependency 'opentelemetry-resource-detector-azure', '~> 0.3.0' | ||
| spec.add_development_dependency 'opentelemetry-resource-detector-container', '~> 0.3.0' | ||
| spec.add_development_dependency 'opentelemetry-resource-detector-google_cloud_platform', '~> 0.4.0' | ||
| spec.add_development_dependency 'opentelemetry-sdk', '~> 1.12' |
| spec.add_development_dependency 'opentelemetry-sdk', '~> 1.12' | ||
|
|
||
| if spec.respond_to?(:metadata) | ||
| spec.metadata['changelog_uri'] = "https://open-telemetry.github.io/opentelemetry-ruby/opentelemetry-logs-sdk/v#{OpenTelemetry::OtelConfig::VERSION}/file.CHANGELOG.html" |
| spec.metadata['changelog_uri'] = "https://open-telemetry.github.io/opentelemetry-ruby/opentelemetry-logs-sdk/v#{OpenTelemetry::OtelConfig::VERSION}/file.CHANGELOG.html" | ||
| spec.metadata['source_code_uri'] = "https://github.com/open-telemetry/opentelemetry-ruby/tree/#{spec.name}/v#{spec.version}/logs_sdk" | ||
| spec.metadata['bug_tracker_uri'] = 'https://github.com/open-telemetry/opentelemetry-ruby/issues' | ||
| spec.metadata['documentation_uri'] = |
Description
This PR introduces an initial declarative configuration implementation for OpenTelemetry Ruby under otelconfig, intentionally scoped to tracing only.
I followed the Go otelconfig convention for config structure and behavior where it maps well to Ruby.
Minimalist approach: no hard dependency on optional components, which optional propagators and resource detectors are resolved only if the corresponding gem has been required by the user. If an optional component is not available, configuration continues safely with warnings instead of failing startup.
Scope in this PR
Try
/example