Solving the Sudoku puzzle by logic only

Up: Puzzle solving by computer
Prev: The hexiamond puzzle

This page describes how to solve a Sudoku puzzle. It's a puzzle in a 9x9 grid where each number (from 1 to 9) has to appear exactly once in each row, column and exactly once in each of the nine 3x3 squares. If you've never heard of it, search the web, there are plenty of examples.

It's an intriguing puzzle because it can seem at once very simple and very, very difficult. I wrote a solution finder that simply tried each number in each position but that takes forever. I didn't let it finish, I have a sneaking suspicion it's end-of-the-universe type stuff.

The solution

So, my final solution ended up being a bit smarter. It employs three kinds of tricks:

Firstly, it goes though each cell and determines if there is only one possible number that can be placed there. This is the simplest and most simple puzzles found in newspapers can be solved with this method alone. It just takes a bit of time, although it's trivial for a computer.
Secondly, it checks each cell to see if there is a number possible there that is not possible in any other cell in the row, column or box. This is a slightly more sophisticated method although you can apply it very easily when you're looking at the grid yourself. It gets you out of stuck spots where the first method can't proceed.
Thirdly, it tries the method known as locked candidates. It's the technique where if a number in any row or column is only present in a single box, you can exclude it from the rest of the box. The site given has abetter explanation.
The final method is one that simply count the number of distinct number in a group of cells and if that number is the same as the number of cells you've made a subgroup, This restricts the contens of other cells and may lead to a solution.

The program basically keeps applying the first method until it stops working. It then applies the second method and tries the first again. It only tries the later methods if the earlier ones don't make progress. The solver never guesses though it could easily be made to do so. If it can't sole it it shows you where it got upto and stops. Usually I can't solve the resulting grid either which means I can't really fix it either.

The implementation

The grid is stored as a two dimensional array. The cells are referenced by Row and Col numbers from 1 to 9. The cell either contains a number or is unbound (represented by '_' in Prolog). The important basic predicates are:

available( In, Out ) :- Given a list of cells (either bound or unbound) returns the set of numbers that do not appear in that set. Unbound cells are ignored.
possibles( P, Rin, Cin, Out ) :- Given the grid P, return the set of possible values for the cell (Rin,Cin). It works basically by extracting the values in the same row, column and box, run the available/2 predicate on each and return the intersection of those sets. Note, if the cell already has a value, this predicate fails. You shouldn't be doing that anyway. It also simplifies coding as filled-in cells will be automatically ignored.
row( P, Row ), col( P, Col ), box( P, Box ) :- return the values in the grid associated with the given row, column or box.

After seeing the above basic predicates you can see how the solving might proceed. Note, the built-in findall/3 predicate is used a lot to find all possible solutions.

Discovered values are stored in the form of [Row,Col,Value]. If in an intermediate stage with multiple possible values, we use [Row,Col,[Values]]. The predicate assign/3 applies the found results to the grid. It fails if there is nothing to apply, so each individual step doesn't need to explicitly check if it didn't find anything.

Sanity check: Check that there are no cells for which possibles/4 returns an empty set. This means we screwed up. This is where it's important for possibles/4 to fail on filled in cells rather than return an empty set.
First method: Find all cells where possibles/4 returns a single element. If found, that's the element to fill in.
Second method: This is trickier. For each row, column and box we collect the set of possibles for each cell. Within a single (for example) row, we take each the possibles for each cell and subtract the possibles for the other cells. If there is exactly one number left over, that's number it should be.
We do this search in row, column and boxes one after the other and return the union of the results. (row|col|box)set/3 are the testing predicates. (row|col|box)all/2 are the predicates to collect the results.
Third method: This is the Locked Candidates test. For each row, column or box it extract the set of possibles. If a number in that row, column or box only appears in one section of it, it is excluded in the other direction also.
Much of the complexity here it caused by the fact that I generalised it to handle all the possiblities. It choose a major axis (say row) and number (say 1) and a minor axis (say box). It will then select all the cells in row 1. It divides these cells by the minor axis giving you a set with the number used in the intersections of (row 1, box1) (row 1, box 2) (row 1, box 3). If any of those contain a number not in the other two, that number can be removed for all other rows in the same box.
A bit difficult to get your head around but it's a very effective solve method.
Fourth method: This basically scans each row, column or box and tries to find a subset of cells where the number of distinct numbers is the same as the number of cells. Works quite well.

If none of the above work we've either solved the puzzle or got stuck. In either case, print the result and stop.

Speed

Basically, it solves the puzzle quicker than you can type it in. The problems in the file I've put in the predicate problem/2 so you basically say:

   > problem(1,X), solve(X).

This will solve the first puzzle in the file. I defined 7. Ideally it should read from a file but I havn't done that yet.

It prints out each step along the way along with which each type of method it used. You could turn this off if you want. It's probably fast enough to create a puzzle setter. solve/2 will fail outright if the puzzle is inconsistent and return an incomplete grid if there is not enough information. You could score the different methods of solving to try and make harder-to-solve puzzles.

Download: sudoku3.plg

Up: Puzzle solving by computer
Prev: The hexiamond puzzle

By Martijn van Oosterhout (kleptog (at) svana.org)
Copyright © 2000-2006 - Last modified 3/04/2006