Lab-scale AI infrastructure
GPU clusters, Slurm, IaC, monitoring, quotas, and the everyday question of how to make shared compute usable, fair, and reliable.
Compute, experiments, agents, representations, and lab workflows all shape what research is possible. I work on the pieces between them: cluster operations, reproducible research tooling, autonomous experimentation, and structure-aware AI models.
GPU clusters, Slurm, IaC, monitoring, quotas, and the everyday question of how to make shared compute usable, fair, and reliable.
Reproducible experiment loops, agent-manageable repos, provenance, hypothesis trees, and tooling that helps research move without losing context.
How models encode position, geometry, and topology — from ViT positional encodings and RoPE to persistent homology and topology-aware representations.
End-to-end research loops for dry labs and wet labs: systems that design experiments, measure outcomes, update beliefs, and decide what to try next.
Ansible, Slurm, GPU dashboards, service deployment, and admin workflows for a lab where compute is always the bottleneck.
Experiment control planes, source-of-truth repos, job routing, run lineage, paper ingestion, and hypothesis state that survives context switches.
Systems that propose experiments, run them, measure outcomes, and learn what to try next — whether the experiment is on a GPU or a bench.
How should a small research lab allocate and operate GPU clusters without turning admin work into a full-time job?
What does an AI-manageable research repo need so agents can help without corrupting provenance or hiding failures?
Can positional encodings, RoPE variants, and topological signals make vision models understand structure better?
What is the smallest useful closed-loop experiment system — in software, dry lab, or wet lab — that teaches the right lessons?
How do we detect, explain, and evaluate AI-generated media as generation models keep improving?