Qwen-Scope
A tiernew this weekOpen-source sparse autoencoder suite that cracks open Qwen LLMs so you can see—and steer—exactly what's happening inside.
Kai's verdict
The most practical open SAE release yet for a major model family — if you're doing mechanistic interpretability or need to surgically fix code-switching and style issues in Qwen models, this is the real deal. Too niche and infrastructure-heavy for anyone outside the ML research/engineering lane. (Verdict pending Phi's full review.)
Strengths
- 14 groups of SAE weights across 7 Qwen3/Qwen3.5 model variants (dense + MoE) — broadest open SAE coverage for a single model family
- Goes beyond inspection: enables inference-time steering, fine-tuning regularization (SASFT), benchmark redundancy analysis, and data curation — all via feature activations
- Live Hugging Face demo Space for zero-setup exploration before committing to local setup
- SAE-based toxicity classifier hits F1 > 0.90 with no trained head — just logical rules over features
- Fully open weights mirrored on ModelScope for China-region access
Weaknesses
- Strictly Qwen3/Qwen3.5 base models only — zero portability to other model families out of the box
- Requires significant ML infrastructure to actually run (large model weights, GPU memory, Python/PyTorch stack) — not accessible to non-engineers
- A research-grade release: documentation and tooling polish lag behind production-ready SDKs
Best for
ML engineers and AI researchers working with Qwen models who need interpretability, inference steering, or training-time behavior control without full fine-tuning.
Pricing
Free (open-source)
All SAE weights and tooling are freely available on Hugging Face and ModelScope under open-source terms; a live interactive demo Space is also free to use.