libs/auth/storage: add dormant secure-storage foundation#5008
Merged
simonfaltum merged 24 commits intomainfrom Apr 22, 2026
Merged
libs/auth/storage: add dormant secure-storage foundation#5008simonfaltum merged 24 commits intomainfrom
simonfaltum merged 24 commits intomainfrom
Conversation
6 tasks
First of a three-PR sequence that moves file-based OAuth token cache management from the SDK to the CLI. This PR adds a CLI-local copy of FileTokenCache under libs/auth/storage, a DualWrite helper that mirrors the SDK's historical dualWrite + hostCacheKey convention, wires u2m.WithTokenCache at every NewPersistentAuth call site (including CLICredentials, which is used by every non-auth command), and switches auth logout to the CLI FileTokenCache for token removal. The SDK is unchanged so behavior is byte-for-byte identical: two redundant writes to the same file with the same keys and tokens. See documents/fy2027-q2/cli-ga/2026-04-21-move-token-cache-to-cli-plan.md.
CLI's forbidigo rules forbid os.UserHomeDir (use env.UserHomeDir) and os.IsNotExist (use errors.Is(err, fs.ErrNotExist)). Thread ctx through NewFileTokenCache so the env-based home directory lookup works. Co-authored-by: Isaac
ea3802a to
4dd3b58
Compare
4 tasks
Replaces the caller-side storage.DualWrite helper with a DualWritingTokenCache wrapper. Every write through the wrapper under the primary key is also mirrored under the host key, so refresh paths (Token, ForceRefreshToken) preserve cross-SDK compatibility after the SDK stops dual-writing internally, not just Challenge paths. Co-authored-by: Isaac
Callers (cmd.RunE closures) now construct the FileTokenCache and pass it to discoveryLogin, runInlineLogin, and loadToken. Previously each of those helpers built the file cache internally, which meant unit tests hitting discoveryLogin or loadToken would create/touch ~/.databricks/token-cache.json on the developer's machine. Tests now pass the in-memory cache helper, so the real file is no longer a side effect of running the suite. Addresses review feedback from Mihai on login.go and token.go. Co-authored-by: Isaac
4dd3b58 to
014a6a6
Compare
renaudhartert-db
approved these changes
Apr 22, 2026
Contributor
renaudhartert-db
left a comment
There was a problem hiding this comment.
LGTM modulo resolution of open comments.
Reads [__settings__].auth_storage in the same style as GetConfiguredDefaultProfile. Write path is deliberately not provided yet. Unused until a follow-up PR wires it into the storage-mode resolver.
Introduces the secure / legacy / plaintext mode enum and a precedence resolver that reads override -> DATABRICKS_AUTH_STORAGE env -> [__settings__].auth_storage -> Legacy. Nothing imports this yet; a follow-up PR wires it into the auth commands.
Switch to t.Setenv so DATABRICKS_AUTH_STORAGE and DATABRICKS_CONFIG_FILE are hermetically overridden per subtest. The previous test used the context-based env.Set helper, which falls through to os.LookupEnv when the key is absent from the context - that meant a developer or CI lane with a real DATABRICKS_AUTH_STORAGE export could fail the 'env unset' subtests. Also add a doc note on ResolveStorageMode clarifying that an invalid override is a caller bug rather than user input, so the asymmetric error wrapping (source name on env/config; bare on override) is intentional.
…dency Adds github.com/zalando/go-keyring (MIT) and wires up KeyringCache, which satisfies cache.TokenCache. Service name is 'databricks-cli'; account is the SDK-provided cache key. Lookup and Store(nil) are stubbed; follow-up tasks fill them in. Tests use a fake backend injected via an internal interface.
Maps the backend's ErrNotFound onto cache.ErrNotFound. Unmarshal errors surface as wrapped errors rather than ErrNotFound so callers can tell 'missing' apart from 'corrupted'.
Deleting a missing entry is a no-op so that logout is idempotent across re-runs. Other backend errors bubble up unchanged.
Mirrors GitHub CLI's pattern. Protects against Linux D-Bus hangs in containers and SSH sessions without a running Secret Service daemon. TimeoutError is exported so callers can render a tailored message.
Pairs with the zalando/go-keyring dependency added earlier in this branch. Without this block TestNoticeFileCompleteness fails because every direct dep in go.mod must have a matching entry under its SPDX license section in NOTICE.
Linter auto-fix from make lint. Functionally identical; testifylint prefers the dedicated assertion because it gives a clearer failure message when the error type does not match.
The zero value of the exported type was a foot-gun: a bare
&KeyringCache{} has a nil backend and a nil errNotFound sentinel, which
would panic on first Store or Lookup. Since the only valid construction
is via NewKeyringCache, lower the struct to keyringCache and have the
constructor return cache.TokenCache directly. Callers never see the
concrete type.
Drops the explicit if-branch for empty Op in favor of cmp.Or, matching modern Go 1.25 idioms. Same output, one less branch.
Per go-code-structure.md principles 1 and 2, move the precedence logic into a pure resolveStorageMode(override, envValue, configValue) that takes plain strings and has no side effects. ResolveStorageMode stays as a thin wrapper that reads the env var and the .databrickscfg setting and delegates. The pure core is now trivially testable without t.Setenv or TempDir. TestResolveStorageMode covers all 8 precedence cases inline. TestResolveStorageMode_ReadsEnvAndConfig separately verifies the wrapper actually reads from env and config (3 small subcases). Also flips mode_test.go to package storage (internal) so the pure core is reachable without exporting it.
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
- Drop the internal documents/fy2027-q2/cli-ga reference and the MS1/MS3/MS4 milestone shorthand from the package godoc. - Rewrite the precedence chain in resolveStorageMode as a switch so the branches are visibly exclusive and terminal.
- Drop the errNotFound injection from keyringCache and use keyring.ErrNotFound directly. The fake backend in tests returns the same sentinel instead of a custom one. - Wrap stored tokens in a keyringEntry envelope so the on-disk format can gain fields (scopes, profile checksum, ...) without breaking older CLI versions that read the same entry.
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
- Introduce StorageModeUnknown as the zero value so the type has a meaningful "unset" state. - Replace parseMode + validateMode with a single exported ParseMode that returns StorageModeUnknown for empty or unrecognized input. The flag parser in the follow-up PR can use it to reject invalid user input at the point of entry. - Trust the typed override in ResolveStorageMode; validation now belongs to whichever parser produced the StorageMode. - Short-circuit before reading .databrickscfg when override or env selects a mode. Adds a test that points DATABRICKS_CONFIG_FILE at an unreadable path to lock in the skip. Co-authored-by: Isaac
Co-authored-by: Isaac
7fc4937 to
f50a274
Compare
jamesbroadhead
pushed a commit
to jamesbroadhead/cli
that referenced
this pull request
Apr 22, 2026
) ## Stack Entry point for the opt-in secure token storage work. Review and merge top-to-bottom: 1. **databricks#5056 auth: import FileTokenCache into CLI and wrap cache for dual-write (this PR)** 2. databricks#5008 libs/auth/storage: add dormant secure-storage foundation 3. databricks#5013 auth: wire secure-storage cache into CLI This PR is also the first of a separate 3-PR sequence that moves file token cache ownership from the SDK to the CLI. That sequence (SDK PR removing the internal dual-write, then SDK bump in the CLI) can proceed in parallel with databricks#5008 and databricks#5013. ## Why First of 3 PRs moving file-based OAuth token cache ownership from the Go SDK to the CLI. Today the SDK owns both the cache interface and the file-backed implementation, including the dual-write-under-host-key convention. Long-term we want the SDK to stop owning persistence: the OAuth flow and cache interface stay, but file format and host-key conventions move to the CLI. This unblocks secure storage backends and Renaud's Session model on a cleaner foundation. This PR imports the cache into the CLI and wires it everywhere. Nothing is deleted from the SDK yet. PRs 2 and 3 (SDK PR, then SDK bump) finish the move. ## Changes **Before:** CLI relied on the SDK's default `FileTokenCache`. Dual-write to the legacy host-based key happened inside `PersistentAuth.Challenge()` and `refresh()` via the SDK's internal `dualWrite`. **Now:** CLI owns its own `FileTokenCache` at `libs/auth/storage/file_cache.go`, a near-verbatim copy from the SDK (same JSON schema, same path `~/.databricks/token-cache.json`, same permissions). A new `storage.DualWritingTokenCache` wraps the file cache so that every write through it under the primary key is also mirrored under the legacy host key. Every `u2m.NewPersistentAuth` call site in the CLI now passes `u2m.WithTokenCache(storage.NewDualWritingTokenCache(fileCache, arg))`. Because mirroring happens inside the cache's `Store` method, every SDK-internal write (Challenge, refresh, discovery) dual-writes automatically. No call site needs to remember to invoke a helper, so refresh paths (`Token()`, `ForceRefreshToken()`) preserve cross-SDK compatibility just like login paths do. The SDK is unchanged. It still dual-writes internally, so the two writes hit the same file with the same keys and bytes, i.e. idempotent. Zero user-visible behavior change. Files touched: - `libs/auth/storage/file_cache.go` + test (new) - `libs/auth/storage/dual_writing_cache.go` + test (new) - `cmd/auth/login.go`, `cmd/auth/token.go`, `cmd/auth/logout.go` - `libs/auth/credentials.go` - `NEXT_CHANGELOG.md` **Lint-driven deltas from SDK:** - `os.UserHomeDir()` is forbidden in the CLI, uses `env.UserHomeDir(ctx)` instead. Required threading `ctx` through `NewFileTokenCache`. - `os.IsNotExist(err)` is forbidden, uses `errors.Is(err, fs.ErrNotExist)`. **Known edge case:** Tokens that exist only under the legacy host key (users who logged in before profile-keyed writes existed and never re-ran `auth login --profile`) keep working for now because the SDK's internal dualWrite still runs. After PR 2 (SDK stops dual-writing), re-login will be required for those users. Minimal impact. ## Test plan - [x] Unit tests for `file_cache_test.go` (port of SDK tests) - [x] Unit tests for `dual_writing_cache_test.go` covering primary-key mirroring, non-primary passthrough, no host-key, host-key-equals-primary, discovery with populated/empty `GetDiscoveredHost`, and Lookup delegation - [x] `make checks` and `make test` pass - [ ] Manual smoke test of `databricks auth login`, `auth token`, `auth logout` on a live profile before merging
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack
Part of the opt-in secure token storage stack. Review and merge top-to-bottom:
Base is #5056 so this PR can reuse the CLI-owned
libs/auth/storagepackage. #5013 stacks on top and wires the resolver and keyring cache into command code.Why
Groundwork for the CLI GA work that introduces OS-native secure token storage behind an experimental opt-in. This PR adds the building blocks without wiring them into any command. #5013 plugs the resolver and cache into login/token/logout, CLICredentials, and everything else that authenticates.
Changes
Before: the CLI only has the SDK's file-backed
TokenCacheand has no way to select a different storage backend at runtime.Now: three additive pieces, all dormant. Nothing imports the new
libs/auth/storagepackage from production code yet.libs/auth/storage/mode.go:StorageModeenum (legacy,secure,plaintext) andResolveStorageMode(ctx, override)that resolves precedenceoverride -> DATABRICKS_AUTH_STORAGE env -> [__settings__].auth_storage in .databrickscfg -> Legacy.libs/auth/storage/keyring.go:KeyringCacheimplementation of the SDK'scache.TokenCache, backed bygithub.com/zalando/go-keyring(MIT, same library used by GitHub CLI). Includes a 3-second per-operation timeout that protects against Linux D-Bus hangs, and a pluggable backend interface so tests inject a fake without touching the OS keychain. Service name isdatabricks-cli; the account field carries the cache key the SDK passes through.libs/databrickscfg/ops.go:GetConfiguredAuthStoragereader, mirroring the existingGetConfiguredDefaultProfileshape.go.mod/go.sum/NOTICE/NEXT_CHANGELOG.md: dependency add and required metadata.No command code changes. No user-visible behavior change.
Test plan
ResolveStorageMode: override / env / config precedence, case-insensitive env parsing, invalid-value rejection for all three sources. Hermetic viat.Setenvso CI env cannot leak in.KeyringCachetests using a fake backend: happy-pathStore+Lookupround-trip, missing-entry returnscache.ErrNotFound, other-error propagation viaerrors.Is, corrupted-JSON handling, idempotent delete path, and timeout for all three operations.GetConfiguredAuthStoragereader: missing file, missing section, missing key, explicit values.make checksclean (tidy + whitespace + links).make testclean: 5061 unit tests + 2473 acceptance tests, 0 failures.make lintclean on the diff.grepconfirms no production callers oflibs/auth/storageexist yet.