SIA (Self-Improving AI)

A tiernew this week

An open-source agent framework that closes the self-improvement loop by autonomously rewriting both its own scaffold and model weights — no human tuning required between cycles.

Open SIA (Self-Improving AI) →Compare with alternatives

Kai's verdict

The dual-lever approach — simultaneously evolving the harness and the weights — is a genuinely interesting architectural bet that outperforms scaffold-only baselines across all tested domains, but three tasks and marketing hyperbole mean you should treat this as promising early-stage research infrastructure, not a proven production system. (Verdict pending Phi's full review.)

Strengths

Uniquely edits both the agent harness (prompts, tools, retry logic) AND model weights in one loop — most frameworks only do one or the other
Demonstrated gains across wildly different domains: 56.6% gain on LawBench, 91.9% runtime reduction on GPU kernels, 502% improvement on scRNA-seq denoising
Multi-provider backend support (Claude, OpenAI, Gemini) with a clean pip install and four bundled benchmark tasks out of the box
Feedback-Agent dynamically selects the RL algorithm (PPO, GRPO, entropic weighting) based on the observed reward signal — not a fixed recipe
MIT licensed with an academic grant program and partnerships at Stanford, Oxford, and UCSB for external validation

Weaknesses

Only validated on three tasks in the paper; broader generalization across poorly specified objectives is still unproven
Both levers optimize the same fixed verifier, creating a Goodhart's law risk where the joint fixed point looks strong on benchmarks but may be brittle under real-world perturbation
The splashy '350× superintelligence' claim from marketing doesn't appear in the actual research paper — treat with skepticism

Best for

ML researchers and advanced AI engineers who want to experiment with recursive self-improvement architectures on custom benchmark tasks, without being locked into any single LLM provider.

Pricing

Free (MIT open source)

Completely free and open source under MIT license; you supply your own LLM API keys (Anthropic, OpenAI, Gemini). Hexo Labs Grant Program available for researchers needing infrastructure credits.

Alternatives worth knowing

Manus

Autonomous AI agent that actually finishes tasks.

Devin

Cognition Labs' autonomous coding engineer.

Open Agent Leaderboard

A public benchmarking dashboard that ranks AI agents by real-world task performance, accuracy, and cost-efficiency — all in one filterable view.

Hermes Agent

A self-improving, model-agnostic CLI agent from Nous Research that now solves MCP's context-bloat problem with BM25-powered Tool Search.