Same, same, but different

When I’m building queryMT and experimenting with coding models + tool capabilities, I love putting new models through their paces on my local GPU. A couple of weeks ago, MistralAI dropped a small coding model called Devstral.

Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents

Naturally, I was eager to try it on my RTX 3090. I grabbed the Devstral GGUF created by Bartowski - kudos to him all the effort for of releasing GGUFs so quickly:

Honestly when I saw this, I quickly got discouraged testing this model further.

A week ago I gave it another chance, so I pulled the ollama’s version of this model, and 💥

A week later, I decided to give Devstral another shot. This time I pulled the version hosted by ollama-and boom! The difference was night and day. Suddenly Devstral was navigating my code, editing files, and calling tools exactly as advertised. Here’s the same query above using ollama’s devstral:

Curious about consistency, I then tried Unsloth’s GGUF. It worked flawlessly—just like ollama’s.

The reason for the different behaviour is probably - I haven’t really checked it! - the chat template itself.

Key takeaway

Not all GGUFs are created equal. The publisher, build settings, and even your chat template can make or break a model’s ability to use tools. If your new favorite agent model isn’t behaving, try a different build or tweak the system/prompt template.

Project introspection with tool-enabled LLMs

I’ve also started using my tool-enabled LLM to explore codebases automatically. Need file metadata, function definitions, or TODOs scattered across hundreds of files? Let the model scan your repo with the right tool plugin, then summarize its findings. It’s like having a supercharged grep or code reviewer at your fingertips.

Happy experimenting—and remember: “same” model name doesn’t always mean “same” behaviour. The details of how it’s published really do matter.