Curtis Covington

Experiment reports and implementation notes with an explicit protocol.

Metrics first, caveats included, and enough context to understand the result instead of just the headline.

March 3, 2026 11 min read

Hardening SAE Steering on Gemma: Only Code Survives the Controls

I ran a hardened Gemma SAE steering evaluation with holdout prompts, matched controls, wrong-hook checks, and seed extensions. Code held up; most other categories did not.

Read featured post
ai interpretability sparse-autoencoders mechanistic-interpretability gemma

Latest field note from the lab archive.

March 1, 2026 10 min read

AI Brain Surgery on a Small Model: Can One SAE Feature Control Behavior?

A wiki-only SAE intervention lab on Pythia-70M: feature ranking, residualized feature knobs, and activation-gated minimal-pair behavior tests.

ai interpretability sparse-autoencoders transformers
Read post
February 24, 2026 9 min read

From Tokens to Concepts: Building a Local SAE Interpretability Lab on an M1 Mac

Layer sweeps, top-k sparsity sweeps, and an interactive explorer for SAE feature discovery on a small open-weight model.

ai interpretability transformers sparse-autoencoders
Read post
February 22, 2026 8 min read

Interpretability via Sparse Autoencoders on a Small Open-Weight Transformer

A local-only SAE feature probing experiment on Pythia-70M: activation extraction, sparse autoencoder training, cross-domain generalization, and blog-ready artifacts.

ai interpretability transformers pytorch
Read post
September 29, 2025 6 min read

Building a Name CLI with GPT-5-Codex

Pairing with GPT-5-Codex to turn the SSA baby-name dataset into a polished Go CLI with weighted sampling, charts, and automated releases.

ai go cli
Read post

About

This blog is where I write up hands-on experiments in interpretability, sparse autoencoders, feature steering, and related tooling with the methodology, results, and caveats left in.

M.S. in Computer Science University of Colorado Boulder Distributed systems and infrastructure Interpretability and local-first tooling