Publications

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

NeurIPS 2024, ICML AI for Math Workshop **Best Paper Award**, 2024

We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving.

Download here

An In-Context Learning Agent for Formal Theorem-Proving

Conference on Language Modeling, 2024, 2024

We present an in-context learning agent for formal theorem-proving in environments like Lean and Coq. Current state-of-the-art models for the problem are finetuned on environment-specific proof data. By contrast, our approach, called COPRA, repeatedly asks a high-capacity, general-purpose large language model (GPT-4) to propose tactic applications from within a stateful backtracking search. Proposed tactics are executed in the underlying proof environment. Feedback from the execution is used to build the prompt for the next model query, along with selected information from the search history and lemmas retrieved from an external database. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the CompCert project. On these benchmarks, COPRA significantly outperforms few-shot invocations of GPT-4. It also compares favorably against finetuning-based approaches, outperforming ReProver, a state-of-the-art finetuned approach for Lean, in terms of the pass@1 metric.

Download here

$k$-NIM Trees: Characterization and Enumeration

Submitted, 2022

Among those real symmetric matrices whose graph is a given tree T, the maximum multiplicity $M(T)$ that can be attained by an eigenvalue is known to be the path cover number of $T$. We say that a tree is $k$-NIM if, whenever an eigenvalue attains a multiplicity of $k − 1$ less than the maximum multiplicity, all other multiplicities are 1. $1$-NIM trees are known as NIM trees, and a characterization for NIM trees is already known. Here we provide a graph-theoretic characterization for $k$-NIM trees for each k ≥ 1, as well as count them. It follows from the characterization that k-NIM trees exist on n vertices only when $k = 1, 2, 3$. In case k = 3, the only 3-NIM trees are simple stars.

Download here

George Tsoukalas

Publications

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

An In-Context Learning Agent for Formal Theorem-Proving

$k$-NIM Trees: Characterization and Enumeration