Inference time scaling has proven extremely effective at improving LLM performance in domains with a generator-verifier gap, where generating candidate solutions is much harder than verifying correctness. Several popular methodologies for scaling inference compute have been explored, with many widely used approaches involving Reinforcement Learning to elicit long Chains-Of-Thought for self-correction, as well as generating multiple candidate solutions and selecting the most correct one (known as best-of-n). Combining these methodologies has proven highly effective, boosting key benchmark results in competitive coding (IOI for o3) and mathematics (Frontier Math, AIME).
This paper explores a more inference-efficient approach to scaling best-of-n for reasoning models through parallel reasoning, by pruning reasoning chains early when they don't contribute to candidate solution diversity. Our experiments on the AIME competition math benchmark demonstrate that our method achieves equivalent pass@50 performance by pruning 40 reasoning chains after only 300 tokens, decoding just 10 reasoning chains to completion.