COMPUTING SCIENCE

# Unwed Numbers

The mathematics of Sudoku, a puzzle that boasts "No math required!"

NP or Not NP, That Is the Question

Computer science has an elaborate hierarchy for classifying problems according to difficulty, and the question of where Sudoku fits into this scheme has elicited some controversy and confusion. It is widely reported that Sudoku belongs in the class NP, a set of notoriously difficult problems; meanwhile, however, many computer programs effortlessly solve any order-3 Sudoku puzzle. There is actually no contradiction in these facts, but there is also not much help in dispelling the confusion.

Complexity classes such as NP do not measure the difficulty of any
specific problem instance but rather describe the rate at which
difficulty grows as a function of problem size. If we can solve an
order-*n* Sudoku, how much harder will we have to work to
solve a puzzle of order *n* + 1? For problems in NP, the
effort needed grows exponentially.

Most discussions of the complexity of Sudoku refer to the work of Takayuki Yato and Takahiro Seta of the University of Tokyo, whose analysis relates the task of solving Sudoku to the similar problem of completing a partially specified Latin square. The latter problem in turn has been connected with others that are already known to be in NP. This process of "reduction" from one problem to another is the standard way of establishing the complexity classes of computational problems. Yato and Seta employ an unusual form of reduction that addresses the difficulty of finding an additional solution after a first solution is already known. In Sudoku, of course, well-formed puzzles are expected to have only one solution. Yato and Seta say their result applies nonetheless. I don't quite follow their reasoning on this point, but the literature of complexity theory is vast and technical, and the fault is likely my own.

When you lay down your pencil on a completed Sudoku, the thought that you've just dispatched a problem in the class NP may boost your psychological wellbeing, but the NP label doesn't say anything about the relative difficulty of individual Sudoku puzzles. For that, a different kind of hierarchy is needed.

Many publishers rank their Sudoku on a scale from easy to hard (or from gentle to diabolical). The criteria for these ratings are not stated, and it's a common experience to breeze through a "very hard" puzzle and then get stuck on a "medium."

One easily measured factor that might be expected to influence
difficulty is the number of givens. In general, having fewer cells
specified at the outset ought to make for a harder puzzle. At the
extremes of the range, it's clear that having *all* the cells
filled in makes a puzzle very easy indeed, and having none filled in
leaves the problem under-specified. What is the minimum number of
givens that can ensure a unique solution? For an order-*n*
grid, there is a lower bound of *n* ^{2} - 1. For
example, on an order-3 grid with fewer than eight givens, there must
be at least two numbers that appear nowhere among the givens. With
no constraints on those symbols, there are at least two solutions in
which their roles are interchanged.

Can the *n* ^{2} - 1 bound be achieved in practice?
For *n* = 1 the answer is yes. On the order-2 grid there are
uniquely solvable puzzles with four givens but not, I think, with
three. (Finding the arrangements with just four givens is itself a
pleasant puzzle.) For order 3, the minimum number of givens is
unknown. Gordon Royle of the University of Western Australia has
collected more than 24,000 examples of uniquely solvable grids with
17 givens, and he has found none with fewer than 17, but a proof is lacking.

Published puzzles generally have between 25 and 30 givens. Within this range, the correlation between number of givens and difficulty rating is weak. In one book, I found that the "gentle" puzzles averaged 28.3 givens and the "diabolical" ones 28.0.

EMAIL TO A FRIEND :

**Of Possible Interest**

**Feature Article**: In Defense of Pure Mathematics

**Feature Article**: The Statistical Crisis in Science

**Computing Science**: Clarity in Climate Modeling