KaiAI tutor for anyone
← All tools

Qwen-Scope

A tiernew this week

Open-source sparse autoencoder suite that cracks open Qwen LLMs so you can see—and steer—exactly what's happening inside.

Open Qwen-Scope →Compare with alternatives

Kai's verdict

The most practical open SAE release yet for a major model family — if you're doing mechanistic interpretability or need to surgically fix code-switching and style issues in Qwen models, this is the real deal. Too niche and infrastructure-heavy for anyone outside the ML research/engineering lane. (Verdict pending Phi's full review.)

Strengths

  • 14 groups of SAE weights across 7 Qwen3/Qwen3.5 model variants (dense + MoE) — broadest open SAE coverage for a single model family
  • Goes beyond inspection: enables inference-time steering, fine-tuning regularization (SASFT), benchmark redundancy analysis, and data curation — all via feature activations
  • Live Hugging Face demo Space for zero-setup exploration before committing to local setup
  • SAE-based toxicity classifier hits F1 > 0.90 with no trained head — just logical rules over features
  • Fully open weights mirrored on ModelScope for China-region access

Weaknesses

  • Strictly Qwen3/Qwen3.5 base models only — zero portability to other model families out of the box
  • Requires significant ML infrastructure to actually run (large model weights, GPU memory, Python/PyTorch stack) — not accessible to non-engineers
  • A research-grade release: documentation and tooling polish lag behind production-ready SDKs

Best for

ML engineers and AI researchers working with Qwen models who need interpretability, inference steering, or training-time behavior control without full fine-tuning.

Pricing

Free (open-source)

All SAE weights and tooling are freely available on Hugging Face and ModelScope under open-source terms; a live interactive demo Space is also free to use.

Alternatives worth knowing