Lean: a Different Take on Verification

Cameron Freer
Research Scientist, BCS (MIT)

Guest Lecture
MIT 6.S976 / 18.S996 (Spring 2026)
Cryptography and Machine Learning: Foundations and Frontiers
Shafi Goldwasser & Vinod Vaikuntanathan

March 31, 2026

Why study Lean in this course?

Earlier: one-time pads, pseudorandomness, interactive proofs, zero knowledge, PAC verification, self-proving models.
Today: explicit proof objects, machine-checked theorems, verified software.

Crypto produces certificates checked by a verifier.
Lean produces proof terms checked by a kernel.
The mechanism changes, but the design goal is the same: a small trusted checker facing potentially untrusted search.

What is Lean?

A single system that is both a programming language and a proof assistant.
The same language for code, specifications, and proofs (no translation layer).
Tactics and automation build proof terms; a small kernel checks them.
Lean is implemented in Lean and compiles to native code.
Mathlib: 266,437 theorems. 127,144 definitions. 772 contributors.

The proof term plays the role of the certificate. Tactics, LLMs, and humans are all search strategies that produce it.

Why formalize?

“I would like to effusively thank Leonardo de Moura for writing my favourite computer game…” — Kevin Buzzard (ICM 2022)

Confidence. Formalization “provides an extremely high level of confidence” (Tao); most valuable when proofs are long, fragile, or under-refereed.
Refactoring. “We had formalized the proof with this constant 12, and then when this new paper came out, we said, ‘Okay, let’s update the 12 to 11.’ You just in your headline theorem change a 12 to 11. You run the compiler and… of the thousands of lines of code you have, 90% of them still work.” (Tao, Lex Fridman interview). Formal math is maintainable.
Composition. “Now with these formal systems, for the first time, you can do mathematics and just buy somebody’s part off the shelf” (Kontorovich). Mathlib is the realization of this for Lean.
Verification. For CS and security, formalization gives something stronger than tests: wrong assumptions surface as proof obligations that cannot be discharged (de Moura).

Buzzard: ICM 2022, EMS Press. Tao: AMS Notices, Jan 2025. Kontorovich: Quanta, Mar 2026. de Moura: blog, Feb 2026.

Why formalize? (continued)

Exploration. Making every step explicit often reveals structure invisible on paper. AI backed by formalization lets humans explore more deeply and with more confidence.
Stronger results. Formalization can extract explicit bounds and produce constructive versions of existence proofs. Validated numerics give rigorous error guarantees.
Collaboration at scale. Precise interfaces let many people (or AI agents) work on different pieces independently. Mathlib has 772 contributors.
Teaching. Executable textbooks, interactive explanations. A formalization-backed AI can teach with verified claims, not just plausible ones.
Consolidation. Formalization is a natural way to revisit and organize prior work: extracting modular pieces, clarifying interfaces between results, turning a backlog of published-but-scattered results into a coherent library that enables new research.
Scientific literature. Semantic search over formal libraries, living papers that update as dependencies evolve, a shift in the focus of peer review from line-checking to ideas.
Clear signal for AI. Formal proofs give AI a well-defined success criterion (at runtime or training): the kernel accepts or rejects. No ambiguity about what “correct” means.

Mathlib as substrate

266,437 theorems. 127,144 definitions. 772 contributors, whose expertise spans mathematics, computer science, and software engineering.
Covers analysis, algebra, topology, measure theory, number theory, combinatorics, and more.
Incomplete enough that frontier projects still build infrastructure (examples later in the talk).
Contributions flow back: each project pushes prerequisites upstream.
Much of formalization is library engineering, not just proving the target theorem.

leanprover-community.github.io/mathlib_stats.html

How Lean works

Lean’s type theory: one primitive, three familiar special cases

Prop is the type of propositions. To prove p, construct a term t : p.
The primitive is the dependent function type (x : A) → B x.
Three special cases:
- When B does not depend on x: ordinary function type A → B.
- When A : Prop and B : Prop: implication P → Q.
- When B x is a proposition for each x: universal quantification ∀ x : A, B x.
One syntax for programs, specifications, and proofs.

Compare: Rocq/Coq shares this foundation. HOL systems have simple types. Mizar and Metamath use set theory.

Dependent types can encode invariants

All code on these slides has been tested against the Lean 4 type-checker.

Vector α n: a list of length exactly n. Appending gives Vector α (n + m).
Fin n: a natural number less than n. Out-of-bounds indexing is a type error, not a runtime crash.
SortedList lo: a sorted list with lower bound lo. Sortedness is enforced by the type, not a runtime assertion:

-- A sorted list carries its lower bound in the type
inductive SortedList : Nat → Type where
  | nil (lo : Nat) : SortedList lo
  | cons (x : Nat) {lo : Nat} (h : x ≤ lo) (tail : SortedList lo) : SortedList x

-- [1, 3, 5] with proof that each element ≤ the next
example : SortedList 1 :=
  .cons 1 (by omega) (.cons 3 (by omega) (.nil 5))

Side conditions in prose become part of the type, checked automatically.

What is actually trusted?

Complementary ways to check a Lean proof:

Reference kernel (C++), ~8,000 lines. The implementation Lean itself uses.
Nanoda (Rust): an independently implemented external checker.
Lean4Lean: a complete Lean 4 type-checker, written in Lean itself.
leanchecker: replays built declarations through the kernel in a fresh process.
comparator: trusted challenge file + sandboxed build + external validation.

Analogous to multiple independent implementations of a crypto protocol.

After comparator: specification risk remains

Lean can check that a proof term matches a theorem statement. It cannot decide whether your definitions match your mathematical intent.

Did I define the right object?
Did I state the theorem I meant?
Did I forget a hypothesis?
Did a totalized API hide a boundary case?

-- totalized: deriv = 0 when undefined
#check Real.deriv_rpow_const
-- (x p : ℝ) : deriv (· ^ p) x = p * x ^ (p-1)

-- explicit hypotheses:
#check Real.hasDerivAt_rpow_const
-- {x p : ℝ} (h : x ≠ 0 ∨ 1 ≤ p) :
--   HasDerivAt (· ^ p) (p * x ^ (p-1)) x

Bridges: state core theorems with explicit predicates, review definitions separately from proofs, triangulate with sanity-check lemmas, have a human review the statement.

lean-lang.org/doc/reference/latest/ValidatingProofs/

A crypto-style threat model for proof checking

Hidden sorry or custom axioms? Use #print axioms thmName.
.olean state tricks or core state bugs? Re-check with leanchecker --fresh.
Actively malicious proof submissions? Use comparator: trusted challenge file, sandboxed build, external validation.
Proofs by native evaluation? Look for Lean.trustCompiler in the axiom list; it enlarges the trusted computing base.

Different concerns call for different checks, analogous to escalating from honest to malicious adversary models in this course.

Note: AI-generated code may not be explicitly adversarial, but models will find workarounds even when trying to be helpful. The same checks apply.

A tiny theorem: one-time-pad round-trip

-- XOR on booleans, by pattern-matching. same → false, different → true.
def bxor : Bool → Bool → Bool
  | true,  true  => false
  | false, false => false
  | _,     _     => true

-- Encrypt a 1-bit message with a 1-bit key.
def enc (m k : Bool) : Bool := bxor m k

def bxor defines a function by pattern matching. Lean checks that all cases are covered.
These are executable definitions, not axioms. Lean can compute with them.

(What happens if we XOR twice with the same key?)

Propositions as types

bxor and enc return Bool. But Lean can also express propositions:

def odd (n : Nat) : Prop := ∃ k, n = 2 * k + 1

The type of odd 3 is Prop: a statement that can be true or false.
We cannot evaluate it the way we evaluate bxor true false. It is a proposition, not a computation.
But we can prove it by providing a witness:

example : odd 3 := ⟨1, by decide⟩

The witness 1 and the proof that 3 = 2 * 1 + 1 together form a term of type odd 3.

Propositions are types.
Proofs are terms.

The proof: encrypting twice recovers the message

Theorem proving as a game: you see goals and hypotheses, then apply moves (tactics). Each move transforms the goal until nothing remains.

theorem otp_roundtrip (m k : Bool) : enc (enc m k) k = m := by
  cases m <;> cases k <;> rfl

cases m splits the goal into m = true and m = false.
<;> applies the next tactic to all remaining goals.
rfl: both sides compute to the same value. No goals remain. The kernel checks the resulting proof term.

#print axioms otp_roundtrip

'otp_roundtrip' depends on axioms: [propext]

Only propext (propositional extensionality). No sorry, no Classical.choice. The trust base is visible.

Three proof styles on one proposition

variable {p q : Prop}
-- Term style: build the proof object directly       
theorem and_swap_term (h : p ∧ q) : q ∧ p :=
  And.intro h.right h.left

-- Tactic style: tell Lean how to construct the proof
theorem and_swap_tac (h : p ∧ q) : q ∧ p := by
  exact ⟨h.right, h.left⟩

-- Declarative: state what you're building            
theorem and_swap_decl (h : p ∧ q) : q ∧ p := by
  constructor
  · exact h.right
  · exact h.left

All three produce the same proof term. Lean does not distinguish between them once elaboration is done.

The same three styles on the authorization theorem

-- Procedural: looks like a recursive program
theorem once_denied_proc :
    ∀ pre suf : List Decision,
      authorize (pre ++ (.deny :: suf)) = .deny
  | [], _ => rfl
  | .deny :: _, _ => rfl
  | .allow :: ds, suf => by
      simpa [authorize] using once_denied_proc ds suf

-- Tactic: delegates work to induction and case splitting
theorem once_denied_tac
    (pre suf : List Decision) :
    authorize (pre ++ (.deny :: suf)) = .deny := by
  induction pre with
  | nil => rfl
  | cons d ds ih =>
      cases d <;> simp [authorize, ih]

Same statement, same kernel, different surface style.

(The theorem: if a deny appears anywhere in the list, the overall authorization is deny, regardless of what comes before it.)

The authorization theorem, declaratively

theorem once_denied_decl
    (pre suf : List Decision) :
    authorize (pre ++ (.deny :: suf)) = .deny := by
  induction pre with
  | nil => rfl
  | cons d ds ih =>
      cases d with
      | deny => rfl
      | allow =>
          calc
            authorize ((.allow :: ds) ++ (.deny :: suf))
                = authorize (ds ++ (.deny :: suf)) := by rfl
            _ = .deny := ih

calc reads like a short mathematical argument: “rewrite, rewrite, finish.”
All three styles elaborate to proof terms in the same dependent type theory.

Procedural exposes recursive structure. Tactic is often fastest to write. Declarative can be easier to read back later.

A dependent-type twist: n-bit one-time pads

def OTP (n : Nat) := Fin n → Bool

def encVec {n : Nat} (m k : OTP n) : OTP n :=
  fun i => bxor (m i) (k i)

example {n : Nat} (m k : OTP n) (i : Fin n) :
    encVec (encVec m k) k i = m i := by
  simp only [encVec]
  cases m i <;> cases k i <;> rfl

n appears inside the type. OTP n = an n-bit pad, represented as Fin n → Bool.
Instead of saying in prose “take an n-bit string,” we make the length part of the type.
The checker prevents mixing pads of incompatible lengths before any proof begins.

Deny-overrides: from toy theorem to authorization logic

inductive Decision where
  | allow | deny

def authorize : List Decision → Decision
  | []          => .deny
  | .deny :: _  => .deny
  | .allow :: rs => authorize rs

theorem once_denied_always_denied
    (pre suf : List Decision) :
    authorize (pre ++ (.deny :: suf)) = .deny := by
  induction pre with
  | nil => rfl
  | cons d ds ih =>
      cases d <;> simp [authorize, ih]

This is toy code, but the structure (inductive types, recursive authorization, inductive proofs) is the same one used in Cedar’s production Lean model at AWS.

Example formalizations

Exchangeability in Lean: one theorem, three proof routes

Exchangeability = invariance of a random sequence under finite permutations of coordinates.
de Finetti–Ryll-Nardzewski: every exchangeable sequence on a standard Borel space is conditionally i.i.d.
Three complete proofs sharing a “common ending”: reverse martingale, L² contractability, Koopman / mean ergodic.
2.5 pages of source mathematics became 43k+ lines across 100+ files, because frontier measure-theory infrastructure was missing from Mathlib.
Much of the work was rough going in September 2025; lean4-skills began from patterns and antipatterns noted in my CLAUDE.md file during this project.

Module import graph for the exchangeability project

#print axioms deFinetti → propext, Quot.sound, Classical.choice (the standard three).

Sphere packing in dimension 8

Viazovska (2022 Fields Medal): the \(E_8\) lattice is the densest sphere packing in dimension 8.
Lean formalization (Birkbeck, Hariharan, Lee, Ma, Mehta, Viazovska). Blueprint-driven: each PR proves one or a few propositions, reviewed by domain experts.

Image:
J. Gregory
Moxness

AI contributions

All of my contributions were AI-assisted (Claude and Codex + lean4-skills + lean-lsp-mcp).
One contribution found and corrected a missing \(E_4\) factor in the blueprint statements.
Human maintainers provided rapid, welcoming peer review; ideas from their feedback were upstreamed into lean4-skills to improve future use.

sorry-free milestone

AI-generated sorry-free proof from February 2026: 50k+ LOC pull request.

Noperthedron: massive computation, small checker

Rupert’s property: a convex body can pass a larger copy of itself through a hole cut in the original.

A cube has Rupert’s property. (David Renshaw / Quanta)

The Noperthedron does not. (David Renshaw)

Steininger & Yurkevich (2025): the Noperthedron disproves the conjecture that all convex polyhedra have Rupert’s property.
Proof partitions a 5D parameter space into ~18.7 million regions; each region is checked locally.
Lean formalization (Reed, Renshaw): continuous math fully formalized; the computational step is the remaining sorry.

A massive computation produces a certificate; a small checker validates each piece locally.

Lean for CS, cryptography, and security

Lean in computer science: current state

Today: Simas (arXiv:2601.15571) verified coNP, \(\Sigma_2^P\), PP, and PSPACE completeness results in Lean. ~24k lines, 1,252 theorems.
Gap: minimal complexity theory, no CompCert (verified optimizing C compiler, in Rocq), no seL4 (verified OS kernel, in Isabelle). Other provers have a head start in CS verification.
Where it’s going: CSLib aims to build the missing CS substrate for Lean, the way Mathlib did for mathematics.

Lean for cryptography and security

Lean verifies SNARKs: Groth16 soundness (USENIX Security 2024), Cairo/STARK algebraic verification.
Lean theorems can become ZK proofs: zkPi (ACM CCS 2024), Ix (pre-alpha; aspirational: all of Mathlib in ~1 KB, checked in <100ms).
Cedar (AWS): executable Lean model (1.7k LOC) alongside Rust (15.7k LOC), 5.7k LOC of proofs. 100M nightly differential tests. 4 proof-found bugs, 21 test-found bugs. No new Cedar version ships unless the Lean model, proofs, and tests all pass.
Aeneas / SymCrypt (Microsoft): Rust → Lean functional models for verified crypto.

Formal proof objects and cryptographic certificates are converging.

Ix: the proof fits in the margin
Image: Ix / Argument Computer

Lean and coding agents

`lean-lsp-mcp`: tools for agents to interact with Lean

Provides agents with the same tools users have: InfoView, syntax highlighting, documentation exploration.

Diagnostics & Goals	Navigation & Info	Code Exec. & Verif.	Search
`diagnostic_messages` `goal` `term_goal`	`hover_info` `completions` `declaration_file` `file_outline`	`build` `run_code` `verify` `multi_attempt`	`local_search` `leansearch` `loogle` `leanfinder` `state_search` `hammer_premise`

Also: code_actions, get_widgets, get_widget_source, profile_proof

`lean4-skills` for agentic coding in Lean

Reliable workflows via /prove, /formalize, /learn, /checkpoint, etc.:
- Improves efficiency
- Wards the agent away from unhelpful approaches
- Provides guardrails and success criteria
Agents and scripts for iteratively proving, filling sorries, replacing axioms, golfing.
Guidance on how to best use lean-lsp-mcp:
- Rapid testing of proof ideas
- Discovery of lemmas in Mathlib
References in the skill: patterns to follow, antipatterns to avoid.
Continual learning from community contributions:
- Growing community of users, with user-contributed issues and PRs
- Opt-in plugin for bug reports, feature requests, and upstreaming insights from patterns and antipatterns

`lean4-skills /learn`: tools for education and exploration

  > let’s try socratic learning of measurability tactic

  Switching gears! Updating Learning Profile: style=socratic | topic=measurability tactic | mode=mathlib | presentation=supporting

  Let me do some discovery first.

• lean-lsp – Local Search (MCP)(query: “measurability”)
  L { “items”: [] }

• lean-lsp – LeanSearch (MCP)(query: “measurability tactic for proving measurable functions”)
  L { “items”: [{ ... +32 lines (ctrl+o to expand)

• lean-lsp – Lean Finder (MCP)(query: “measurability tactic automation for measurable functions”)
  L { “items”: [{ ... +29 lines (ctrl+o to expand)

  Good discovery results. Here’s what I found:

  – measurability is defined in Mathlib.Tactic.Measurability
  – Under the hood, it’s aesop with the Measurable rule set
  – It handles Measurable f, AEMeasurable f, StronglyMeasurable f, MeasurableSet s, and more
  – measurability? suggests an explicit proof script; measurability! is the strict variant

`lean4-skills /learn`: Socratic interaction

  Pedagogy: Socratic mode — starting with the user’s prior knowledge to calibrate depth.

  Let me begin with a question to figure out where you’re coming from:

  What does it mean for a function to be “measurable” between two measurable spaces?
  In other words, if someone hands you f : α → β and both types carry a MeasurableSpace instance,
  what property must f satisfy?

  (If you’d rather skip the math background and jump straight to how the tactic works in Lean, just say so.)

`lean4-skills /learn`: Socratic interaction (continued)

  > borel subset of beta has borel preimage?

  Pedagogy: User has the right shape of the idea but is over-specializing — nudging toward the general definition.

  You’re very close! The shape is right — preimages of “nice” sets should be “nice.”
  But you’ve jumped to a specific case. Let me push back gently:

  Borel sets are the σ-algebra generated by open sets in a topological space.
  But does every measurable space come from a topology?

  Consider: Lean’s definition is

  def Measurable [MeasurableSpace α] [MeasurableSpace β] (f : α → β) : Prop :=
    ∀ ⦃t : Set β⦄, MeasurableSet t → MeasurableSet (f ⁻¹' t)

  Notice there’s no TopologicalSpace anywhere — just MeasurableSpace.
  So what’s the more general statement? And when would your “Borel” version be the right way to think about it?

`lean4-skills /learn`: Socratic interaction (continued)

  > oh almost-measurable preimage is almost-measurable?

  Pedagogy: User is guessing rather than reading the definition — redirecting attention to what’s actually written.

  Not quite — let’s slow down and look at exactly what the definition says.
  Forget “almost-measurable” for now (that’s a different concept: AEMeasurable, which means measurable after modifying on a null set).

  The definition is staring right at us:

  ∀ ⦃t : Set β⦄, MeasurableSet t → MeasurableSet (f ⁻¹' t)

  Read it out loud in English. There are only two ingredients: MeasurableSet and preimage.
  No “almost,” no “Borel,” no topology.

  In plain words: what does MeasurableSet t → MeasurableSet (f ⁻¹' t) say?

Proof terms are certificates.
Search can be done by hand, using tactics, or AI-assisted.
The trusted checker stays small.

Getting started with Lean

Learn Lean

Projects mentioned today

Thank You

Cameron Freer
Research Scientist, BCS (MIT)
freer@mit.edu

MIT 6.S976 / 18.S996 — Cryptography and Machine Learning
Guest Lecture, March 31, 2026

Lean: a Different Take on Verification

Why study Lean in this course?

What is Lean?

Why formalize?

Why formalize? (continued)

Mathlib as substrate

How Lean works

Lean’s type theory: one primitive, three familiar special cases

Dependent types can encode invariants

What is actually trusted?

After comparator: specification risk remains

A crypto-style threat model for proof checking

A tiny theorem: one-time-pad round-trip

Propositions as types

The proof: encrypting twice recovers the message

Three proof styles on one proposition

The same three styles on the authorization theorem

The authorization theorem, declaratively

A dependent-type twist: n-bit one-time pads

Deny-overrides: from toy theorem to authorization logic

Example formalizations

Exchangeability in Lean: one theorem, three proof routes

Sphere packing in dimension 8

Noperthedron: massive computation, small checker

Lean for CS, cryptography, and security

Lean in computer science: current state

Lean for cryptography and security

Lean and coding agents

lean-lsp-mcp: tools for agents to interact with Lean

lean4-skills for agentic coding in Lean

lean4-skills /learn: tools for education and exploration

lean4-skills /learn: Socratic interaction

lean4-skills /learn: Socratic interaction (continued)

lean4-skills /learn: Socratic interaction (continued)

Proof terms are certificates. Search can be done by hand, using tactics, or AI-assisted. The trusted checker stays small.

Getting started with Lean

Thank You

`lean-lsp-mcp`: tools for agents to interact with Lean

`lean4-skills` for agentic coding in Lean

`lean4-skills /learn`: tools for education and exploration

`lean4-skills /learn`: Socratic interaction

`lean4-skills /learn`: Socratic interaction (continued)

`lean4-skills /learn`: Socratic interaction (continued)

Proof terms are certificates.
Search can be done by hand, using tactics, or AI-assisted.
The trusted checker stays small.