Run local models from your macOS menu bar — behind an OpenAI-compatible API.
Website · Install · How it works · Develop
Dakodeon is a tiny menu bar app. Start a local
llama-server router and point any agent — OpenCode,
Zed, or your own scripts — at http://127.0.0.1:8080/v1. Your agent chooses the model by id;
Dakodeon shows what's loaded and manages the downloads.
It bundles no runtime and no weights. It drives the llama.cpp and hf tools already
on your machine, so the app itself stays tiny.
brew install --cask emin93/tap/dakodeonNote
Requirements: macOS 14+, with llama-server and hf on your PATH.
brew install llama.cpp
pip install -U "huggingface_hub[cli]" # provides `hf`| 🧭 Menu bar control | Start/stop the server and see the active model from a slim panel. |
| 📦 Model manager | A Settings window shows each model's download status — download, cancel, delete, or reveal weights in Finder. |
| 🔄 Selection in your agent | Clients like OpenCode select a model by id; llama-server routes to that profile and keeps one loaded at a time. The app has no model picker. |
| 💤 Idle model sleep | The router stays online, loads models on demand, and unloads model memory after inactivity. |
| 🧹 Clean shutdown | Quitting the app stops llama-server. |
| 🚀 Native defaults | llama-server loads each model's trained context and embedded chat template. |
| 🧩 Curated assets | Profiles can include weights, draft models, vision projectors, and tuned llama-server flags. |
Dakodeon launches llama-server in router mode and exposes the standard
OpenAI-compatible endpoints. GET /v1/models returns the available profile ids,
such as gemma4-26b-a4b-it-qat and ornith-1.0-35b. Chat requests route by the
JSON model field, so switching models in a client like OpenCode also moves the
active model Dakodeon shows in the menu.
The router process is designed to stay up for long sessions. Dakodeon starts it with on-demand model loading and a five-minute idle sleep window: the model process and KV cache are released after inactivity, while the OpenAI-compatible endpoint remains available and reloads the requested model on the next task. The menu and Settings window show whether the active model is loaded, loading, sleeping, or unloaded.
POST http://127.0.0.1:8080/v1/chat/completions
GET http://127.0.0.1:8080/v1/modelsModel files download to the shared Hugging Face cache via hf; the app resolves the
local GGUF paths and points the server at them — nothing is copied or duplicated.
Sizes and LFS hashes come from hf models list <repo> -R --json. Models are shown by
their Hugging Face repository — the same id clients send.
Profiles are curated in code at
Sources/Dakodeon/Catalog.swift. Each ModelProfile
declares its weights, an optional draft / MTP model, an optional vision projector, an optional chat template, and any extra llama-server flags.
The app exposes no per-user configuration — to add a model, append an entry:
ModelProfile(
id: "ornith-1.0-35b",
weights: ModelAsset(repo: "deepreinforce-ai/Ornith-1.0-35B-GGUF", file: "ornith-1.0-35b-Q4_K_M.gguf"),
draft: nil,
mmproj: nil,
chatTemplate: ModelTemplates.ornith,
extraArguments: ["-ngl", "999", "--reasoning-format", "deepseek"]
)Bundled today
| Profile | Quant | Draft | Vision | Download |
|---|---|---|---|---|
| Gemma 4 26B A4B IT QAT | UD-Q4_K_XL | MTP | ✓ | 15.69 GB |
| Ornith 1.0 35B | Q4_K_M | - | - | 21.17 GB |
Ornith uses its Hugging Face chat template with a small Codex compatibility patch:
developer messages are rendered as system messages, and system messages are not
required to appear only at index zero. Thinking remains enabled; llama-server
is started with --reasoning-format deepseek so <think> content is returned as
structured reasoning instead of being printed in the assistant answer.
make run # build, package, and launch the .app
make dist # build the signed Dakodeon.app bundle
make zip # build dist/Dakodeon.zip (release artifact)| File | Responsibility |
|---|---|
Catalog.swift |
Curated model profiles + types |
ModelStore.swift |
Download / delete / status via the hf cache |
ServerController.swift |
llama-server lifecycle, active-model sync, shutdown |
MenuView.swift |
The menu bar panel |
SettingsView.swift |
Model-management window |
DakodeonApp.swift |
App entry, scenes, and icons |
MIT for the app. Model weights remain under their own licenses — the bundled Gemma profile follows the Gemma Terms of Use.
