Твој уређај, твоји подаци, твој агент

Лични AI асистент који 100% ради на уређају. Нема cloud-а. Нема Python-а. Твоји подаци не напуштају машину.

device_agent.toml

Проблем

Siri шаље твој глас Apple-у. Google Assistant шаље твоје упите Google-у. Alexa све шаље Amazon-у. Сваки "паметни" асистент тражи cloud скок. Твој календар, контакти, поруке и локацијски подаци напуштају уређај.

Шта ако можеш да имаш способног асистента који никада не пошаље ни byte ван твоје машине?

Конфигурација

device_agent.toml defines a complete on-device assistant with local LLM inference and access to all your Apple data through iMCP - all in 82 lines of TOML.

Local LLM inference

[agent]
provider = "llama_cpp"
model = "unsloth/Qwen3-VL-8B-Instruct-GGUF:UD-Q6_K_XL"
assume_mutating = false
tools = [
  "create_task", "todowrite", "todoread",
  "question", "mdq",
  # All Apple data tools come from iMCP
  "iMCP.*",
]

Wildcard tool pattern

"iMCP.*" is a wildcard that matches every tool exposed by the iMCP MCP server. As iMCP adds new services (Reminders, Notes, Shortcuts), the agent automatically gains access without config changes.

GPU model parameters

[agent.parameters]
n_ctx = 160000        # 160K context window
max_tokens = 8192        # Max response length
top_p = 0.95
top_k = 20
temperature = 0.9        # Creative but grounded
flash_attention = "enabled" # Faster inference

GPU auto-detection

QueryMT auto-detects your GPU (Metal on Apple Silicon, CUDA on NVIDIA, Vulkan elsewhere) and pulls the correct OCI image variant. No driver config. No CUDA toolkit installation.

Three-layer compaction for long conversations

# Layer 1: Tool output truncation
[agent.execution.tool_output]
max_lines = 2000
max_bytes = 51200

# Layer 2: Pruning after every turn
[agent.execution.pruning]
protect_tokens = 40000

# Layer 3: AI summary on context overflow
[agent.execution.compaction]
auto = true

Calendar queries and contact lookups return structured data. But long conversations with many tool calls can still fill the 160K context window. The three-layer compaction system handles this automatically: truncate large tool outputs, prune old messages, and summarize when needed.

MCP server: iMCP

# iMCP - local Apple data access
[[mcp]]
name = "iMCP"
transport = "stdio"
command = "/Applications/iMCP.app/Contents/MacOS/imcp-server"

This tells QueryMT to launch the iMCP MCP server locally. The agent communicates with it over stdio - no network connection involved. iMCP bridges to native Apple APIs directly.

Архитектура

Све остаје на уређају. Подаци ни у једној тачки pipeline-а не напуштају машину:

Your Question "Do I have time for lunch with Sarah?"

→

Agent (llama.cpp) Runs locally on your GPU. Decides which tools to call.

→

iMCP (stdio) Bridges to Calendar, Contacts, Messages, Maps, Weather APIs.

→

Apple APIs Native on-device access. No network hop.

Zero network access

The LLM runs locally via llama.cpp. Tool calls go to iMCP over stdio. iMCP calls Apple's local APIs. At no point does your data traverse a network connection. The agent works offline.

Пример интеракције

You

"Do I have time for lunch with Sarah today?"

Agent → Calendar

Reads today's events. Finds a free window from 12:00 to 13:00.

Agent → Contacts

Looks up Sarah. Finds her phone number and preferred location.

Agent → Messages

Scans recent messages with Sarah. Finds a thread from last week about trying the new Italian place.

Agent → Maps

Estimates commute time to Sarah's preferred restaurant area. 15-minute walk.

Agent responds

"You have a 1-hour window at 12:00. Sarah mentioned wanting to try the Italian place on 5th last week - it's a 15-minute walk from your afternoon meeting. Want me to add a calendar event?"

Кључне функције

100% on-device — local LLM inference, local data access, no cloud
iMCP integration — Calendar, Contacts, Messages, Maps, Weather via native APIs
Wildcard tool patterns — "iMCP.*" auto-includes new services
GPU auto-detection — Metal/CUDA/Vulkan, no manual config
Works offline — stdio transport, no network needed
Three-layer compaction — handles long tool-heavy conversations

Пробај

# 1. Install iMCP
brew install --cask mattt/tap/iMCP
# Or download from https://iMCP.app/download

# 2. Open iMCP.app and enable services (Calendar, Contacts, etc.)

# 3. Run the agent
cargo run --example qmtcode --features dashboard --   confs/device_agent.toml --dashboard

# 4. Ask questions in the dashboard:
#   "What does my schedule look like today?"
#   "Do I have any messages from Sarah this week?"
#   "What's the weather like for my commute home?"