|
A 9x9 sudoku solver and generator
Introduction
sudoku(1)
solves and generates 9x9
sudoku
puzzles.
It has a command line interface with text and static postscript output.
Look elsewhere for GUI solvers.
Constraints
The solver uses depth first and/or breadth first tree search with
constraint propagation to prune the search for the next best move
(forms of
forward checking.)
There are space/time tradeoffs between depth/breadth first search and
the constraints used;
sudoku(1)
has options to control the combinations.
The common characteristic for all constraints, here and elsewhere, is that
they avoid
trial and error.
Its fine for a computer to guess and backtrack but a definite breach of
puzzle manners to require a human to do so.
The constraints are:
- F
-
Forced cell: only one value possible.
- N
-
Only cell: only one value in row/col/box.
- B
-
Box claim: only value in row/col within a box.
- T
-
Tuple: N exact or hidden N-tuples in row/col/box.
- X
-
Singleton cycle: one or more weak edges (x-wing/swordfish.) Requires B.
- Y
-
Pair cycle: one optional non-pair vertex (xy-wing/coloring.)
- W
-
Row/col claim: classic N-row/col x-wing/swordfish.
The
constraint set
is the set of constraints applied to solve a puzzle.
All constraint sets contain at least
F.
The
FN
constraint set is common to most machine and human solvers.
Clue cells
are the givens or initial known cell values.
Solution cells
are the non-clue cells that must be filled in to produce a puzzle solution.
A
candidate value
is one of many possible values for a solution cell.
Candidate values are
eliminated
as solution cells become filled in.
A solution cell is
pinned
or
forced
when all but one of its candidate values is eliminated.
A puzzle is
solved
when all solution cells are pinned.
Constraint propagation
is the process of applying constraints to either pin solution cell values
or eliminate solution cell candidate values.
The solver propagates constraints in the left to right order
FNBTXYW.
When a constraint makes progress (by pinning or elimination)
the constraints are reapplied starting with
F.
This means that a constraint is not attempted until all constraints
before it fail to make progress.
A
valid
puzzle has exactly one solution.
A
minimal
puzzle has a minimal number of clues. Removing any clue in a minimal
puzzle results in an invalid puzzle with multiple solutions.
A
constrained
puzzle is solved by constraint propagation.
An
unconstrained
puzzle requires trial and error (guessing) to produce a solution.
A
proper
puzzle is valid and constrained.
Almost all newspaper puzzles (save typos) are proper.
The
X
and
Y
constraints are cycle based generalizations of x-wing, swordfish, xy-wing
and coloring techniques.
The
XYW
constraints are detailed in the next sections.
See
Angus Johnson
and
Paul Stephens
for details on the
FNBTW
constraints.
X Constraints
X
constraints are cycles composed of sequences of strong and weak edges between cells
with the same candidate value within a single row/col/box.
The vertices of a strong edge are the only cells with the candidate value within the
row/col/box containing the edge.
If a strong or weak edge vertex is assigned the candidate value then the other vertex
in the edge is not assigned the candidate value.
The converse holds for strong edges only: if a strong edge vertex is not assigned the
candidate value then the other vertex is assigned the candidate value.
There are three
X
cycle forms based on strong/weak sequence combinations within the cycle:
-
Odd strong edge sequences connected by one weak edge.
Weak edge cells not participating in the cycle may
be eliminated.
-
One even strong edge sequence and zero or more odd
strong edge sequences connected by one weak edge.
The end cells of the even strong sequence may be eliminated.
-
One odd strong edge sequence connected by two weak edges.
The middle cell of the weak sequence may be eliminated.
X
cycles handle x-wings and many, but not all, N-row/col swordfish configurations.
The
W
constraint covers the swordfish configurations not handled by the
X
constraint.
In the examples that follow, the initial puzzle is listed
as an 81 character string in row order.
Puzzle image strong X edges are red, strong Y edges are purple,
weak edges are green, and eliminated cell values are blue.
The PNG image files were generated by first determining the cycles of interest:
sudoku -v2 example.dat | grep x-
then generating the PostScript for cycle
N:
sudoku -ehp -kCN example.dat > example.ps
and finally converting the PostScript to PNG using the ImageMagick
convert(1)
command:
convert -crop 490x490+60+44 -scale 55% example.ps example.png
This puzzle contains an
X
cycle type 1 on candidate 2, type 2 on candidate 2, and
type 2 on candidate 7:
"puzzle x-1"
1.34..78.4...8.1.3789123456.143..8...75.18234.38.4.61.3.18.456.54..319.88.7..5341
This puzzle contains an
X
cycle type 1 on candidate 2, type 2 on candidate 3, and
type 3 on candidate 3:
"puzzle x-2"
...4.768.456.8927.78..6.....476915.8.98...7.66..7.8.9..6.9738..8.45169.797.824.6.
Y Constraints
Y
constraints are cycles composed of edges between cells with exactly two
candidate values within a single row/col/box, where at least one of the
candidate values is the same in both cells (strong edges.)
Two of the
Y
cycle edges may meet at a cell that contains more than two candidate values,
but has the same candidate as the other endpoints of the edges (two weak edges.)
The candidate value between the weak edges may be eliminated.
Check
here
for a description of the mathematics behind X and Y cycles.
This puzzle contains
Y
cycles on candidates 7 and 1.
"puzzle y-1"
123..67.9456..923.789..2.6..1...7.9..679...1.93..1..7.341..59..578.9...66928..3..
This puzzle contains
Y
cycles on candidates 3 and 5.
The second one has 21 edges, arguably beyond human perseverence.
"puzzle y-2"
4..21..6.....7..82.52.6....5.16.97.467...2.953...578.6....4.253.3..2.94824..93671
W Constraints
The
W
constraints handle non-cyclic swordfish configurations not covered by the cyclic
X
constraints.
This puzzle contains a 3 row swordfish on candidate 7:
"puzzle w-1"
.2.45..8..5.....23.8912.564215894376..82...5.9...15842..25.169859....217861972435
Constrained Puzzles and Magic Cells
A
magic cell
set is the smallest set of solution cells with the
smallest number of candidate values that produces a constrained puzzle.
(Magic cell
was coined by Vegard Hanssen in a private communication.
The
smallest number of candidate values
qualification makes magic
cells more human friendly, e.g.,
it limits the number of guesses when magic cell sets are provided
as hints for solving unconstrained puzzles.)
A puzzle with
M
magic cells for constraint set
C
is
C-M-constrained.
A
C-0-constrained
puzzle is
C-constrained.
If
C
is
FN
then it may be dropped from the notation.
For example,
constrained,
0-constrained,
and
FN-0-constrained
are equivalent, and an
FNW-constrained
puzzle requires an x-wing or swordfish to solve without trial and
error.
The
canonical solution
is a puzzle solution transformed to have the smallest lexicographic
catenation of row column order cell values with the top left box
labeled 123/456/789.
The
canonical magic cell set
is the smallest set of solution cells with the smallest number of
candidate values and the smallest row column order position within the
canonical solution, that produces a constrained puzzle. Each puzzle
has exactly one canonical magic cell set for a given constraint set.
Magic cells listed by
sudoku(1)
are canonical.
This FNBTXYW-1-constrained puzzle has canonical magic cell [1,8]
and a total of 2 magic cells:
"puzzle m-2"
1....6..9....89.3...9..25...7....49.6..89.2..9.1.....3.9..4.....1..689748.497.3..
This FNBTXYW-1-constrained puzzle has canonical magic cell [1,1]
and all solution cells are magic, for a total of 54 magic cells:
"puzzle m-54"
.23.567.....7.....7.9.3...4..8...3..6....2.....48....1.....78..5.19....78...456..
This is one of two known FNBTXYW-2-constrained puzzle (from the ~225M survey in the next section.)
It has canonical magic cells [1,6][9,7]:
"puzzle m-2-2"
.2....6.....1.....7...3..5...8.4.9.33.......8....2.4.663.5.......5..9..2.1..84...
The other was scraped from the sudoku forums:
"puzzle m-2-3"
7.....4...2..7..8...3..8..9...5..3...6..2..9...1..7..6...3..9...3..4..6...9..1..5
Observations
Puzzles can be classified by the constrained property (the magic cell
count.) Magic cells were computed for ~225M <= 30 clue minimal
puzzles generated over 2 days by
sudoku(1),
~300K puzzles provided by
Vegard Hanssen,
~28K puzzles scraped from
the sudoku forum,
and 17 clue (the conjectured minimum) puzzles posted by
Gordon Royle.
The puzzles were collated by equivalence class and only one member of
each class was included in the survey, resulting in 225,116,397 puzzles.
The clue distribution is:
|
24622 | 17 clues |
|
1 | 18 clues |
|
30 | 19 clues |
|
2150 | 20 clues |
|
78956 | 21 clues |
|
1422324 | 22 clues |
|
12041179 | 23 clues |
|
43603113 | 24 clues |
|
73718643 | 25 clues |
|
61850853 | 26 clues |
|
26361499 | 27 clues |
|
5529490 | 28 clues |
|
460016 | 29 clues |
|
23080 | 30 clues |
|
201 | 31 clues |
|
73 | 32 clues |
|
32 | 33 clues |
|
20 | 34 clues |
|
14 | 35 clues |
|
69 | 36 clues |
|
6 | 37 clues |
|
1 | 38 clues |
|
12 | 39 clues |
|
3 | 40 clues |
|
1 | 42 clues |
|
1 | 43 clues |
|
1 | 45 clues |
|
1 | 47 clues |
|
1 | 48 clues |
|
1 | 50 clues |
|
1 | 51 clues |
|
1 | 52 clues |
|
1 | 57 clues |
|
1 | 70 clues |
|
The 17 clue counts are skewed because of Gordon Royle's search for minimal clue puzzles.
The bulk of the surveyed puzzles were machine generated via pseudo random seeding and
this might contribute to further skewing.
A puzzle generator has to work hard to produce puzzles with less than 20 clues.
The constrained (magic cell) counts are:
| FN constrained counts |
|
153,495,385 | constrained |
|
71,554,145 | 1-constrained |
|
66,865 | 2-constrained |
|
| FNBTXYW constrained counts |
|
211,377,110 | constrained |
|
13,739,284 | 1-constrained |
|
1 | 2-constrained |
|
There are a few surprising results.
Even with the basic
FN
constraints, no 3-constrained or greater puzzle was found.
One might expect the 17 clue puzzles to be more likely 2-constrained,
but they have a distribution similar to the random puzzles:
| 17 clue FN constrained counts |
|
11582 | constrained |
|
12728 | 1-constrained |
|
310 | 2-constrained |
|
| 17 clue FNBTXYW constrained counts |
|
23903 | constrained |
|
717 | 1-constrained |
|
0 | 2-constrained |
|
Open question: Are there any 3-constrained 9x9 sudoku puzzles?
Answer: yes.
On 2007-04-07 JPF posted this puzzle on the Player's Forum with singles backdoor size 3:
100000089000009002000000450007600000030040000900002005004070000500008010060300000
There are 3 other known singles backdoor size 3 puzzles as of 2007-04-12.
The apparent rarity of 2-constrained puzzles may have implications on tree search
strategies. If a puzzle is not constrained then it is is most likely
1-constrained, and a breadth first search of all candidate values on the solution
cells should produce a magic cell.
Any depth-first tree search that visits more than
number of solution cells
X
number of candidate values
nodes might have been better off using breadth first search for a magic cell.
If there are no 3-constrained puzzles then a 2 level breadth first search would
be sufficient to solve all puzzles.
In general breadth first loses to depth first on 2-constrained puzzles.
Magic cells may provide hints to discovering the next "logical" constraint.
If a solver were able to determine the magic cells then it could avoid backtracking
altogether.
Magic cells may also be used as hints for unconstrained puzzles, e.g.,
determine the value for cell [r,c] and the puzzle will be constrained.
Examples and timings
The
top95
and
contest
puzzle sets are frequently referenced on the sudoku forum.
The counts for these puzzle sets are:
| top95 FN |
|
0 | constrained |
|
44 | 1-constrained |
|
51 | 2-constrained |
|
| top95 FNBTXYW |
|
42 | constrained |
|
53 | 1-constrained |
|
0 | 2-constrained |
|
| contest FN |
|
222 | constrained |
|
777 | 1-constrained |
|
12 | 2-constrained |
|
| contest FNBTXYW |
|
543 | constrained |
|
468 | 1-constrained |
|
0 | 2-constrained |
|
Command lines and timings for
sudoku(1)
compiled with gcc -O3 -march=pentium4
on linux Intel(R) Pentium(R) 4 CPU 2.80GHz follow.
The constraints selected by the -q option performed the best for each particular puzzle set.
sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t contest.dat
puzzles=1011 guesses=5608 iterations=29045 seconds=0.059990
sudoku -d -qFNB -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t top95.dat
puzzles=95 guesses=2231 iterations=8626 seconds=0.039993
sudoku -a -qF -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t multiple-1.tst
puzzles=957263 guesses=2013617 iterations=5283315 seconds=2.732584
sudoku -a -qF -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t multiple-2.tst
puzzles=50044975 guesses=117985466 iterations=307473894 seconds=161.486451
sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-1-con-sym.dat
puzzles=1183 guesses=2991 iterations=19940 seconds=0.033994
sudoku -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-1-con-sym.dat
puzzles=1183 guesses=2350 iterations=21699 seconds=0.038994
sudoku -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-2-con.dat
puzzles=28948 guesses=680294 iterations=2869221 seconds=9.260592
sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FN-2-con.dat
puzzles=28948 guesses=389196 iterations=1468420 seconds=3.768427
sudoku -d -qFN -f- -Fpuzzles=%n%,guesses=%Q%,iterations=%I%,seconds=%t FNBTXYW-1-con.dat
puzzles=100000 guesses=418449 iterations=2322057 seconds=5.084235
Conclusions
There still seems to be more questions than conclusions about 9x9 sudoku puzzles.
A quick scan of the timing results shows that the best times occur when the
constraint set is restricted to
F
or
FN
and the solver is forced to backtrack due to the weaker constraints.
If additional constraints slow down the solver then why bother implementing them at all?
Well, coding a fast solver is a great pursuit, but it offers little to the human solver.
Pencil and paper limit human solving techniques.
Most aficionados draw the line at trial and error (basically the definition of backtracking),
and that's where the additional constraints come into play.
It is strange, though, that constraint techniques intended to aid human solvers
(to make it a pastime rather than a chore)
sometimes slow down machine solvers.
Sometimes its not the best tactic to make a machine act like a human.
Downloads
Unless otherwise noted,
source and executables are covered by the
Common Public License.
The LGPL Licensed
SudokuExplainer.jar,
by Nicolas Juillerat, contains an additional command line entry point
diuf.sudoku.test.serate.
See
serate(1)
for more information.
Note that some browsers may (improperly) change the
.jar
suffix to
.zip
on download; change the suffix back to
.jar
before using.
|
|
Glenn Fowler |
|
|
Information and Software Systems Research |
|
|
AT&T Labs Research |
|
|
Florham Park NJ |
|
|
January 14, 2009 |
|