Skip to content

libs/auth/storage: add dormant secure-storage foundation#5008

Merged
simonfaltum merged 24 commits intomainfrom
simonfaltum/cli-ga-secure-storage-foundation
Apr 22, 2026
Merged

libs/auth/storage: add dormant secure-storage foundation#5008
simonfaltum merged 24 commits intomainfrom
simonfaltum/cli-ga-secure-storage-foundation

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Apr 17, 2026

Stack

Part of the opt-in secure token storage stack. Review and merge top-to-bottom:

  1. auth: import FileTokenCache into CLI and wire DualWrite #5056 auth: import FileTokenCache into CLI and wire DualWrite
  2. libs/auth/storage: add dormant secure-storage foundation #5008 libs/auth/storage: add dormant secure-storage foundation (this PR)
  3. auth: wire secure-storage cache into CLI #5013 auth: wire secure-storage cache into CLI

Base is #5056 so this PR can reuse the CLI-owned libs/auth/storage package. #5013 stacks on top and wires the resolver and keyring cache into command code.

Why

Groundwork for the CLI GA work that introduces OS-native secure token storage behind an experimental opt-in. This PR adds the building blocks without wiring them into any command. #5013 plugs the resolver and cache into login/token/logout, CLICredentials, and everything else that authenticates.

Changes

Before: the CLI only has the SDK's file-backed TokenCache and has no way to select a different storage backend at runtime.

Now: three additive pieces, all dormant. Nothing imports the new libs/auth/storage package from production code yet.

  • libs/auth/storage/mode.go: StorageMode enum (legacy, secure, plaintext) and ResolveStorageMode(ctx, override) that resolves precedence override -> DATABRICKS_AUTH_STORAGE env -> [__settings__].auth_storage in .databrickscfg -> Legacy.
  • libs/auth/storage/keyring.go: KeyringCache implementation of the SDK's cache.TokenCache, backed by github.com/zalando/go-keyring (MIT, same library used by GitHub CLI). Includes a 3-second per-operation timeout that protects against Linux D-Bus hangs, and a pluggable backend interface so tests inject a fake without touching the OS keychain. Service name is databricks-cli; the account field carries the cache key the SDK passes through.
  • libs/databrickscfg/ops.go: GetConfiguredAuthStorage reader, mirroring the existing GetConfiguredDefaultProfile shape.
  • go.mod / go.sum / NOTICE / NEXT_CHANGELOG.md: dependency add and required metadata.

No command code changes. No user-visible behavior change.

Test plan

  • Table-driven unit tests for ResolveStorageMode: override / env / config precedence, case-insensitive env parsing, invalid-value rejection for all three sources. Hermetic via t.Setenv so CI env cannot leak in.
  • KeyringCache tests using a fake backend: happy-path Store + Lookup round-trip, missing-entry returns cache.ErrNotFound, other-error propagation via errors.Is, corrupted-JSON handling, idempotent delete path, and timeout for all three operations.
  • GetConfiguredAuthStorage reader: missing file, missing section, missing key, explicit values.
  • make checks clean (tidy + whitespace + links).
  • make test clean: 5061 unit tests + 2473 acceptance tests, 0 failures.
  • make lint clean on the diff.
  • grep confirms no production callers of libs/auth/storage exist yet.

@simonfaltum simonfaltum marked this pull request as ready for review April 17, 2026 11:16
Comment thread libs/auth/storage/keyring.go
Comment thread libs/auth/storage/keyring.go Outdated
Comment thread libs/auth/storage/keyring.go Outdated
Comment thread libs/auth/storage/mode.go Outdated
Comment thread libs/auth/storage/mode.go Outdated
Comment thread libs/auth/storage/mode.go Outdated
@simonfaltum simonfaltum marked this pull request as draft April 21, 2026 15:56
First of a three-PR sequence that moves file-based OAuth token cache
management from the SDK to the CLI. This PR adds a CLI-local copy of
FileTokenCache under libs/auth/storage, a DualWrite helper that mirrors
the SDK's historical dualWrite + hostCacheKey convention, wires
u2m.WithTokenCache at every NewPersistentAuth call site (including
CLICredentials, which is used by every non-auth command), and switches
auth logout to the CLI FileTokenCache for token removal. The SDK is
unchanged so behavior is byte-for-byte identical: two redundant writes
to the same file with the same keys and tokens.

See documents/fy2027-q2/cli-ga/2026-04-21-move-token-cache-to-cli-plan.md.
CLI's forbidigo rules forbid os.UserHomeDir (use env.UserHomeDir) and
os.IsNotExist (use errors.Is(err, fs.ErrNotExist)). Thread ctx through
NewFileTokenCache so the env-based home directory lookup works.

Co-authored-by: Isaac
@simonfaltum simonfaltum changed the base branch from main to simonfaltum/cli-tokencache-move-1 April 22, 2026 09:16
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-foundation branch from ea3802a to 4dd3b58 Compare April 22, 2026 09:36
@simonfaltum simonfaltum marked this pull request as ready for review April 22, 2026 09:43
Replaces the caller-side storage.DualWrite helper with a
DualWritingTokenCache wrapper. Every write through the wrapper under the
primary key is also mirrored under the host key, so refresh paths
(Token, ForceRefreshToken) preserve cross-SDK compatibility after the
SDK stops dual-writing internally, not just Challenge paths.

Co-authored-by: Isaac
Callers (cmd.RunE closures) now construct the FileTokenCache and pass it to
discoveryLogin, runInlineLogin, and loadToken. Previously each of those
helpers built the file cache internally, which meant unit tests hitting
discoveryLogin or loadToken would create/touch ~/.databricks/token-cache.json
on the developer's machine. Tests now pass the in-memory cache helper, so
the real file is no longer a side effect of running the suite.

Addresses review feedback from Mihai on login.go and token.go.

Co-authored-by: Isaac
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-foundation branch from 4dd3b58 to 014a6a6 Compare April 22, 2026 11:27
Comment thread libs/auth/storage/mode.go Outdated
Comment thread libs/auth/storage/mode.go Outdated
Copy link
Copy Markdown
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo resolution of open comments.

Reads [__settings__].auth_storage in the same style as
GetConfiguredDefaultProfile. Write path is deliberately not provided yet.
Unused until a follow-up PR wires it into the storage-mode resolver.
Introduces the secure / legacy / plaintext mode enum and a precedence
resolver that reads override -> DATABRICKS_AUTH_STORAGE env ->
[__settings__].auth_storage -> Legacy. Nothing imports this yet; a
follow-up PR wires it into the auth commands.
Switch to t.Setenv so DATABRICKS_AUTH_STORAGE and DATABRICKS_CONFIG_FILE
are hermetically overridden per subtest. The previous test used the
context-based env.Set helper, which falls through to os.LookupEnv when
the key is absent from the context - that meant a developer or CI lane
with a real DATABRICKS_AUTH_STORAGE export could fail the 'env unset'
subtests.

Also add a doc note on ResolveStorageMode clarifying that an invalid
override is a caller bug rather than user input, so the asymmetric error
wrapping (source name on env/config; bare on override) is intentional.
…dency

Adds github.com/zalando/go-keyring (MIT) and wires up KeyringCache, which
satisfies cache.TokenCache. Service name is 'databricks-cli'; account is
the SDK-provided cache key. Lookup and Store(nil) are stubbed; follow-up
tasks fill them in. Tests use a fake backend injected via an internal
interface.
Maps the backend's ErrNotFound onto cache.ErrNotFound. Unmarshal errors
surface as wrapped errors rather than ErrNotFound so callers can tell
'missing' apart from 'corrupted'.
Deleting a missing entry is a no-op so that logout is idempotent across
re-runs. Other backend errors bubble up unchanged.
Mirrors GitHub CLI's pattern. Protects against Linux D-Bus hangs in
containers and SSH sessions without a running Secret Service daemon.
TimeoutError is exported so callers can render a tailored message.
Pairs with the zalando/go-keyring dependency added earlier in this
branch. Without this block TestNoticeFileCompleteness fails because
every direct dep in go.mod must have a matching entry under its SPDX
license section in NOTICE.
Linter auto-fix from make lint. Functionally identical; testifylint
prefers the dedicated assertion because it gives a clearer failure
message when the error type does not match.
The zero value of the exported type was a foot-gun: a bare
&KeyringCache{} has a nil backend and a nil errNotFound sentinel, which
would panic on first Store or Lookup. Since the only valid construction
is via NewKeyringCache, lower the struct to keyringCache and have the
constructor return cache.TokenCache directly. Callers never see the
concrete type.
Drops the explicit if-branch for empty Op in favor of cmp.Or, matching
modern Go 1.25 idioms. Same output, one less branch.
simonfaltum and others added 8 commits April 22, 2026 17:43
Per go-code-structure.md principles 1 and 2, move the precedence logic
into a pure resolveStorageMode(override, envValue, configValue) that
takes plain strings and has no side effects. ResolveStorageMode stays
as a thin wrapper that reads the env var and the .databrickscfg setting
and delegates.

The pure core is now trivially testable without t.Setenv or TempDir.
TestResolveStorageMode covers all 8 precedence cases inline.
TestResolveStorageMode_ReadsEnvAndConfig separately verifies the
wrapper actually reads from env and config (3 small subcases).

Also flips mode_test.go to package storage (internal) so the pure core
is reachable without exporting it.
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
- Drop the internal documents/fy2027-q2/cli-ga reference and the
  MS1/MS3/MS4 milestone shorthand from the package godoc.
- Rewrite the precedence chain in resolveStorageMode as a switch so the
  branches are visibly exclusive and terminal.
- Drop the errNotFound injection from keyringCache and use
  keyring.ErrNotFound directly. The fake backend in tests returns the
  same sentinel instead of a custom one.
- Wrap stored tokens in a keyringEntry envelope so the on-disk format
  can gain fields (scopes, profile checksum, ...) without breaking
  older CLI versions that read the same entry.
Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
- Introduce StorageModeUnknown as the zero value so the type has a
  meaningful "unset" state.
- Replace parseMode + validateMode with a single exported ParseMode that
  returns StorageModeUnknown for empty or unrecognized input. The flag
  parser in the follow-up PR can use it to reject invalid user input at
  the point of entry.
- Trust the typed override in ResolveStorageMode; validation now belongs
  to whichever parser produced the StorageMode.
- Short-circuit before reading .databrickscfg when override or env
  selects a mode. Adds a test that points DATABRICKS_CONFIG_FILE at an
  unreadable path to lock in the skip.

Co-authored-by: Isaac
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-foundation branch from 7fc4937 to f50a274 Compare April 22, 2026 15:43
Base automatically changed from simonfaltum/cli-tokencache-move-1 to main April 22, 2026 16:26
jamesbroadhead pushed a commit to jamesbroadhead/cli that referenced this pull request Apr 22, 2026
)

## Stack

Entry point for the opt-in secure token storage work. Review and merge
top-to-bottom:

1. **databricks#5056 auth: import FileTokenCache into CLI and wrap cache for
dual-write (this PR)**
2. databricks#5008 libs/auth/storage: add dormant secure-storage foundation
3. databricks#5013 auth: wire secure-storage cache into CLI

This PR is also the first of a separate 3-PR sequence that moves file
token cache ownership from the SDK to the CLI. That sequence (SDK PR
removing the internal dual-write, then SDK bump in the CLI) can proceed
in parallel with databricks#5008 and databricks#5013.

## Why

First of 3 PRs moving file-based OAuth token cache ownership from the Go
SDK to the CLI. Today the SDK owns both the cache interface and the
file-backed implementation, including the dual-write-under-host-key
convention. Long-term we want the SDK to stop owning persistence: the
OAuth flow and cache interface stay, but file format and host-key
conventions move to the CLI. This unblocks secure storage backends and
Renaud's Session model on a cleaner foundation.

This PR imports the cache into the CLI and wires it everywhere. Nothing
is deleted from the SDK yet. PRs 2 and 3 (SDK PR, then SDK bump) finish
the move.

## Changes

**Before:** CLI relied on the SDK's default `FileTokenCache`. Dual-write
to the legacy host-based key happened inside
`PersistentAuth.Challenge()` and `refresh()` via the SDK's internal
`dualWrite`.

**Now:** CLI owns its own `FileTokenCache` at
`libs/auth/storage/file_cache.go`, a near-verbatim copy from the SDK
(same JSON schema, same path `~/.databricks/token-cache.json`, same
permissions). A new `storage.DualWritingTokenCache` wraps the file cache
so that every write through it under the primary key is also mirrored
under the legacy host key. Every `u2m.NewPersistentAuth` call site in
the CLI now passes
`u2m.WithTokenCache(storage.NewDualWritingTokenCache(fileCache, arg))`.

Because mirroring happens inside the cache's `Store` method, every
SDK-internal write (Challenge, refresh, discovery) dual-writes
automatically. No call site needs to remember to invoke a helper, so
refresh paths (`Token()`, `ForceRefreshToken()`) preserve cross-SDK
compatibility just like login paths do.

The SDK is unchanged. It still dual-writes internally, so the two writes
hit the same file with the same keys and bytes, i.e. idempotent. Zero
user-visible behavior change.

Files touched:
- `libs/auth/storage/file_cache.go` + test (new)
- `libs/auth/storage/dual_writing_cache.go` + test (new)
- `cmd/auth/login.go`, `cmd/auth/token.go`, `cmd/auth/logout.go`
- `libs/auth/credentials.go`
- `NEXT_CHANGELOG.md`

**Lint-driven deltas from SDK:**
- `os.UserHomeDir()` is forbidden in the CLI, uses
`env.UserHomeDir(ctx)` instead. Required threading `ctx` through
`NewFileTokenCache`.
- `os.IsNotExist(err)` is forbidden, uses `errors.Is(err,
fs.ErrNotExist)`.

**Known edge case:** Tokens that exist only under the legacy host key
(users who logged in before profile-keyed writes existed and never
re-ran `auth login --profile`) keep working for now because the SDK's
internal dualWrite still runs. After PR 2 (SDK stops dual-writing),
re-login will be required for those users. Minimal impact.

## Test plan

- [x] Unit tests for `file_cache_test.go` (port of SDK tests)
- [x] Unit tests for `dual_writing_cache_test.go` covering primary-key
mirroring, non-primary passthrough, no host-key,
host-key-equals-primary, discovery with populated/empty
`GetDiscoveredHost`, and Lookup delegation
- [x] `make checks` and `make test` pass
- [ ] Manual smoke test of `databricks auth login`, `auth token`, `auth
logout` on a live profile before merging
@simonfaltum simonfaltum added this pull request to the merge queue Apr 22, 2026
Merged via the queue into main with commit 2f4b00d Apr 22, 2026
26 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/cli-ga-secure-storage-foundation branch April 22, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants