## Changes
[PR #1572](#1572)
added `Config.HostMetadataResolver` so callers could override the SDK's
`/.well-known/databricks-config` fetch on a per-Config basis. That
covers "I have one Config and I want to wrap it."
The gap: programs that construct many Configs across their command
surface (e.g. the Databricks CLI) end up copying the same
`cfg.HostMetadataResolver = ...` assignment at every construction site,
in the CLI roughly 10 sites across 7 files plus a guardrail test to
catch drift.
This PR adds a package-level default consulted when a Config has no
explicit resolver set. Callers set a factory once during startup; every
subsequent Config gets the same resolver without per-site wiring. The
Config-level field still takes precedence, so PR #1572's contract is
unchanged.
### API
```go
// config/host_metadata.go
var DefaultHostMetadataResolverFactory func(*Config) HostMetadataResolver
```
Plain public variable, set once at init. Matches the stdlib pattern for
single-default hooks: `http.DefaultClient`, `http.DefaultTransport`,
`log.Default`. Callers needing per-Config or dynamic behaviour should
use `Config.HostMetadataResolver` instead.
### Resolution order inside `Config.EnsureResolved`
1. If `Config.HostMetadataResolver` is set, use it.
2. Else, if `DefaultHostMetadataResolverFactory` is non-nil, invoke it
with the resolving Config and use its return value. If it returns nil,
fall through.
3. Else, SDK's default HTTP fetch (unchanged behavior for all existing
callers).
## How the Databricks CLI will use this
The canonical Go idiom for "library A registers itself with library B"
is a blank import that triggers an `init()` in A. This is how
`database/sql` drivers (`_ "github.com/lib/pq"`), image codecs (`_
"image/png"`), and encoding formats register themselves.
After this PR lands and is bumped into the CLI, [CLI PR
#5011](databricks/cli#5011) will collapse from
~10 wired-in `hostmetadata.Attach(cfg)` calls + a guardrail test down to
two small pieces:
**`repos/cli/libs/hostmetadata/resolver.go`** — set the caching factory
at package init:
```go
func init() {
config.DefaultHostMetadataResolverFactory = func(cfg *config.Config) config.HostMetadataResolver {
return NewResolver(cfg.DefaultHostMetadataResolver())
}
}
```
**`repos/cli/cmd/databricks/main.go`** — one blank import to pull the
package in at startup:
```go
import (
// Registers a disk-cached HostMetadataResolver with the SDK so every
// Config the CLI constructs reuses the cached /.well-known lookup.
_ "github.com/databricks/cli/libs/hostmetadata"
)
```
That's the full integration. Every Config the CLI creates, now and in
the future from any new command a developer adds, automatically gets
caching. No per-site `Attach` call to remember, no guardrail test to
maintain, no new developer ever has to learn this mechanism exists to
benefit from it.
### Experimental
Marked experimental to match the existing `HostMetadataResolver` field.
No default behavior change for callers that never set
`DefaultHostMetadataResolverFactory`.
## Tests
Three new tests in `config/config_test.go`, each using a small
`withDefaultHostMetadataResolverFactory(t, factory)` helper that
captures and restores the prior value, so tests never clobber each other
via the package-level default:
- Factory is invoked when Config has no resolver; back-fill works
end-to-end.
- Config-level resolver takes precedence (factory not consulted).
- Factory returning nil falls through to the SDK's HTTP fetch.
- `make fmt test lint` clean
- `go test ./config/... -count=1 -race` clean
Signed-off-by: simon <simon.faltum@databricks.com>
---------
Signed-off-by: simon <simon.faltum@databricks.com>
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
Why
Every CLI command (
databricks auth profiles,bundle validate, every workspace or account call) goes throughConfig.EnsureResolved, which triggers an unauthenticated GET to{host}/.well-known/databricks-configto populate host metadata. That round trip is ~700ms against production and gets paid on every invocation, doubling the latency of otherwise single-request commands.Changes
Before: every CLI invocation hits the well-known endpoint once (or more when multiple configs get constructed).
Now: the first invocation populates a local disk cache under
~/.cache/databricks/<version>/host-metadata/; subsequent invocations read from it. Failures are negatively cached for 60s (except forcontext.Canceled/context.DeadlineExceeded, which are transient and never cached).libs/hostmetadatapackage.NewResolver(fetch)is the primary API — takes an injected fetch function and returns aconfig.HostMetadataResolver.Attach(cfg)is a one-line convenience that wirescfg.DefaultHostMetadataResolver()as the fetch. SDK API from Add HostMetadataResolver for customizable host metadata fetching databricks-sdk-go#1572, shipped in v0.127.0 which is already bumped on main.Attachwired at every non-allowlisted*config.Configconstruction site:cmd/root/auth.go(4 sites),bundle/config/workspace.go,cmd/api/api.go,cmd/auth/{env,login,profiles}.go(3 sites across 2 files),cmd/labs/project/entrypoint.go,libs/auth/arguments.go.DATABRICKS_CACHE_DIRto a temp dir so tests don't leak cache files intoHOME.libs/hostmetadata/injection_guardrail_test.gowalks the tree and flags any newconfig.Config{construction site that lacks a nearbyhostmetadata.Attachcall (allowlist for the handful of legitimately cfg-less-resolve sites).Collateral cleanups
libs/cache/file_cache.go: drop thefailed to stat cache filedebug log when the file is simply missing (fs.ErrNotExist). It was pure noise (the next line,cache miss, computing, conveys the same info) and its OS-specific error text diverged between Unix (no such file or directory) and Windows (The system cannot find the file specified.), breaking cross-platform acceptance goldens. Genuine stat failures (permission, corruption) still log.libs/testdiff/replacement.go:devVersionRegexnow accepts either+SHAor-SHAafter0.0.0-dev.build.GetSanitizedVersion()swaps+to-for filesystem safety when the version is used in cache paths, and the old regex only covered the+form.Test plan
make checkscleanmake lintclean (0 issues)go test ./libs/hostmetadata/... -race— 7 tests (smoke + cache hit + fetch error + cancellation-not-cached + host isolation + end-to-end integration + injection guardrail), all unit tests use an injected mock fetch so nohttptest.Serverrequiredgo test ./libs/cache/... -racecleango test ./cmd/root/... -racecleango test ./bundle/config/... -racecleanacceptance/auth/host-metadata-cache/asserts exactly ONE/.well-known/databricks-configGET across twoauth profilesinvocations sharing aDATABRICKS_CACHE_DIRout.requests.txt(caching works), new[Local Cache]debug lines in cache/telemetry tests, twoWarn: Failed to resolve host metadatalines removed (intentional: the resolver returns(nil, nil)on fetch errors, which is how the SDK interprets "no metadata available"), stat-not-found lines removed (see Collateral cleanups)Live validation against dogfood
Built locally (
go build -o /tmp/databricks-cache-test .) and randatabricks -p e2-dogfood current-user mewith and without a warm cache:DATABRICKS_CACHE_DIR)cache miss, computing→GET /.well-known/databricks-config→computed and stored result[Local Cache] cache hitlineNet per-command savings: ~700ms, matching the Why. Cache dir after one
auth profilesrun contained five JSON files (one per host in.databrickscfg). Inspecting one:{"oidc_endpoint":"https://db-deco-test.databricks.com/oidc/accounts/{account_id}","account_id":"...","workspace_id":"","cloud":"","host_type":"UNIFIED_HOST","token_federation_default_oidc_audiences":["..."]}