Testing Claude Code on a Cosmology Experiment

I wanted to see how far Claude Code can go as a research engine on a real, messy problem, with me directing it. The subject happened to be cosmology. It’s all public: github.com/EdonZo/ccc-hawking-points.

I’m a software engineer, not a cosmologist. So this is exploration — not a contribution to physics — but I ran it like the real thing: pre-registered, reproducible, end to end.

The question

Roger Penrose has a theory called Conformal Cyclic Cosmology — the universe runs through a sequence of “aeons,” each one’s end becoming the next one’s big bang. One testable consequence is the Hawking point: leftover radiation from a black hole in the previous aeon showing up as a faint, roughly circular warm spot in the cosmic microwave background, the oldest light we can see. A 2025 revision put the size at about 6 to 8 degrees across. So: are they there? I had Claude Code build a detector and look, in data from the Planck satellite and the Atacama Cosmology Telescope.

How you do this without fooling yourself

The hard part isn’t the looking. It’s not lying to yourself when you find something. The CMB is full of random-looking fluctuations — hunt hard enough for circular warm spots and you will find circular warm spots. So before touching any real data, I had Claude Code lock down every decision in advance: the detector, the scale, the masking, and a rule that says “this number means detection, this number means nothing,” written first. Then we calibrated against thousands of simulated skies with no signal, to know what “surprising” even looks like.

The 12-sigma that wasn’t

The first run on a raw Planck map lit up at 12 sigma — in physics, 5 sigma is the bar to claim a discovery, so 12 is the kind of number that would be a headline.

It was garbage. The map carried leftover contamination from our own galaxy, and the detector mistook it for signal. What matters is how it got caught: the second telescope showed no matching signal, a diagnostic flagged the map’s noise as 20x too high, and a separate review agent — whose only job was to attack the method — named the exact flaw. We swapped in a clean map, re-ran on the same patch of sky, and the 12 sigma collapsed to noise. The tooling caught its own false alarm before it could become a claim, and that’s the most transferable thing in the whole project.

The result

A clean null. Both telescopes came back below the threshold we set in advance, with no cross-confirmed candidates. No evidence for Hawking points at the 6-8 degree scale — which is a perfectly good outcome. Most of the work was earning the right to trust that null.

Every number traces to disk and every step is in the git history, so anyone can see exactly what the tool did. The full write-up and code are in the repo, and I’ll keep running experiments like this.

See the experiment ↗