Твој уређај, твоји подаци, твој агент
Лични AI асистент који 100% ради на уређају. Нема cloud-а. Нема Python-а. Твоји подаци не напуштају машину.
Проблем
Siri шаље твој глас Apple-у. Google Assistant шаље твоје упите Google-у. Alexa све шаље Amazon-у. Сваки "паметни" асистент тражи cloud скок. Твој календар, контакти, поруке и локацијски подаци напуштају уређај.
Шта ако можеш да имаш способног асистента који никада не пошаље ни byte ван твоје машине?
Конфигурација
device_agent.toml defines a complete on-device assistant with local LLM inference and access to all your Apple data through iMCP - all in 82 lines of TOML.
Local LLM inference
[agent] provider = "llama_cpp" model = "unsloth/Qwen3-VL-8B-Instruct-GGUF:UD-Q6_K_XL" assume_mutating = false tools = [ "create_task", "todowrite", "todoread", "question", "mdq", # All Apple data tools come from iMCP "iMCP.*", ]
"iMCP.*" is a wildcard that matches every tool exposed by the iMCP MCP server. As iMCP adds new services (Reminders, Notes, Shortcuts), the agent automatically gains access without config changes.
GPU model parameters
[agent.parameters] n_ctx = 160000 # 160K context window max_tokens = 8192 # Max response length top_p = 0.95 top_k = 20 temperature = 0.9 # Creative but grounded flash_attention = "enabled" # Faster inference
QueryMT auto-detects your GPU (Metal on Apple Silicon, CUDA on NVIDIA, Vulkan elsewhere) and pulls the correct OCI image variant. No driver config. No CUDA toolkit installation.
Three-layer compaction for long conversations
# Layer 1: Tool output truncation [agent.execution.tool_output] max_lines = 2000 max_bytes = 51200 # Layer 2: Pruning after every turn [agent.execution.pruning] protect_tokens = 40000 # Layer 3: AI summary on context overflow [agent.execution.compaction] auto = true
Calendar queries and contact lookups return structured data. But long conversations with many tool calls can still fill the 160K context window. The three-layer compaction system handles this automatically: truncate large tool outputs, prune old messages, and summarize when needed.
MCP server: iMCP
# iMCP - local Apple data access [[mcp]] name = "iMCP" transport = "stdio" command = "/Applications/iMCP.app/Contents/MacOS/imcp-server"
This tells QueryMT to launch the iMCP MCP server locally. The agent communicates with it over stdio - no network connection involved. iMCP bridges to native Apple APIs directly.
Архитектура
Све остаје на уређају. Подаци ни у једној тачки pipeline-а не напуштају машину:
The LLM runs locally via llama.cpp. Tool calls go to iMCP over stdio. iMCP calls Apple's local APIs. At no point does your data traverse a network connection. The agent works offline.
Пример интеракције
Кључне функције
- 100% on-device — local LLM inference, local data access, no cloud
- iMCP integration — Calendar, Contacts, Messages, Maps, Weather via native APIs
- Wildcard tool patterns —
"iMCP.*"auto-includes new services - GPU auto-detection — Metal/CUDA/Vulkan, no manual config
- Works offline —
stdiotransport, no network needed - Three-layer compaction — handles long tool-heavy conversations
Пробај
# 1. Install iMCP brew install --cask mattt/tap/iMCP # Or download from https://iMCP.app/download # 2. Open iMCP.app and enable services (Calendar, Contacts, etc.) # 3. Run the agent cargo run --example qmtcode --features dashboard -- confs/device_agent.toml --dashboard # 4. Ask questions in the dashboard: # "What does my schedule look like today?" # "Do I have any messages from Sarah this week?" # "What's the weather like for my commute home?"