Curtis Covington
Experiment reports and implementation notes with an explicit protocol.
Metrics first, caveats included, and enough context to understand the result instead of just the headline.
March 3, 2026
11 min read
Hardening SAE Steering on Gemma: Only Code Survives the Controls
I ran a hardened Gemma SAE steering evaluation with holdout prompts, matched controls, wrong-hook checks, and seed extensions. Code held up; most other categories did not.
Read featured post
→
ai
interpretability
sparse-autoencoders
mechanistic-interpretability
gemma
Latest field note from the lab archive.
March 1, 2026
10 min read
AI Brain Surgery on a Small Model: Can One SAE Feature Control Behavior?
A wiki-only SAE intervention lab on Pythia-70M: feature ranking, residualized feature knobs, and activation-gated minimal-pair behavior tests.
ai
interpretability
sparse-autoencoders
transformers
Read post
→
February 24, 2026
9 min read
From Tokens to Concepts: Building a Local SAE Interpretability Lab on an M1 Mac
Layer sweeps, top-k sparsity sweeps, and an interactive explorer for SAE feature discovery on a small open-weight model.
ai
interpretability
transformers
sparse-autoencoders
Read post
→
February 22, 2026
8 min read
Interpretability via Sparse Autoencoders on a Small Open-Weight Transformer
A local-only SAE feature probing experiment on Pythia-70M: activation extraction, sparse autoencoder training, cross-domain generalization, and blog-ready artifacts.
ai
interpretability
transformers
pytorch
Read post
→
September 29, 2025
6 min read
Building a Name CLI with GPT-5-Codex
Pairing with GPT-5-Codex to turn the SSA baby-name dataset into a polished Go CLI with weighted sampling, charts, and automated releases.
ai
go
cli
Read post
→