Re: [Haskell-cafe] Interpreting profiling results

4 Jan 2010

      Am Montag 04 Januar 2010 02:17:06 schrieb Patrick LeBoutillier:
...
Hi,
This question didn't get any replies on the beginners list, I thought
I'd try it here...
Sorry, been occupied with other things. I already took a look, but hadn't anything 
conclusive enough to reply yet.
...
I've written (and improved using other solutions I've found on the
net) a simple sudoku solver which I'm trying to profile. Here is the
code:
import Array
Better

import Data.Array.Unboxed

*much* faster
...
import List (transpose, nub, (\\))
import Data.List
data Sudoku = Sudoku { unit :: Int, cells :: Array (Int, Int) Int,
cells :: UArray (Int,Int) Int
...
holes :: [(Int, Int)] }
cell :: Sudoku -> (Int, Int) -> Int
cell s i = (cells s) ! i
instance Read Sudoku where
 readsPrec _ s = [(Sudoku unit cells holes, "")]
   where unit = length . words . head . lines $ s
         cells = listArray ((1, 1), (unit, unit)) (map read . words $ s)
         holes = [ c | c <- indices cells, (cells ! c) == 0]
instance Show Sudoku where
 show s = unlines [unwords [show $ cell s (x,y) | x <- [1 .. unit s]]
| y <- [1 .. unit s]]
genNums :: Sudoku -> (Int, Int) -> [Int]
genNums s c@(i,j) = ([1 .. u] \\) . nub $ used
 where
nub isn't nice. It's quadratic in the length of the list. Use e.g. 

map head . group . sort

or

Data.[Int]Set.toList . Data.[Int]Set.fromList

if the type is in Ord (and you don't need the distinct elements in the order they come 
in). That gives an O(n*log n) nub with a sorted result.
And (\\) isn't particularly fast either (O(m*n), where m and n are the lengths of the 
lists). If you use one of the above instead of nub, you can use the O(min m n) 'minus' for 
sorted lists:

xxs@(x:xs) `minus` yys@(y:ys)
    | x < y     = x : xs `minus` yys
    | x == y    = xs `minus` ys
    | otherwise = xxs `minus` ys
xs `minus` _ = xs

Here, you can do better:

genNums s c@(i,j) = nums
  where
      nums = [n | n <- [1 .. u], arr!n]
      arr :: [U]Array Int Bool
      arr = accumArray (\_ _ -> False) True (0,u) used
...
   used = (row s u i j) ++ (col s u i j) ++ (square s sq u i j)
   u = unit s
Not good to calculate sq here. You'll use it many times, calculate once and store it in s.
...
   sq = truncate . sqrt . fromIntegral $ u
row s u i j = [cell s (i, y) | y <- [1 .. u]]
col s u i j = [cell s (x, j) | x <- [1 .. u]]
square s sq u i j = [cell s (x, y) | y <- [1 .. u], x <- [1 .. u], f x i, f
y j] where f a b = div (a-1) sq == div (b-1) sq
Test for f y j before you generate x to skip early.

square s sq u i j = [cell s (ni+x,nj+y) | x <- [1 .. sq], y <- [1 .. sq]]
   where
      qi = (i-1) `div` sq
      qj = (j-1) `div` sq
      ni = qi*sq
      nj = qj*sq
...
solve :: Sudoku -> [Sudoku]
solve s =
 case holes s of
   [] -> [s]
   (h:hs) -> do
     n <- genNums s h
     let s' = Sudoku (unit s) ((cells s) // [(h, n)]) hs
     solve s'
main = print . head . solve . read =<< getContents
When I compile as such:
$ ghc -O2 --make Sudoku.hs -prof -auto-all -caf-all -fforce-recomp
and run it on the following puzzle:
0 2 3 4
3 4 1 0
2 1 4 0
0 3 2 1
I get the following profiling report:
       Fri Jan  1 10:34 2010 Time and Allocation Profiling Report  (Final)
          Sudoku +RTS -p -RTS
       total time  =        0.00 secs   (0 ticks @ 20 ms)
That means the report is basically useless. Not entirely, because the allocation figures 
may already contain useful information. Run on a 9x9 puzzle (a not too hard one, but not 
trivial either).

Also, run the profiling with -P instead of -p, you'll get more info about time and 
allocation then.
...
       total alloc =     165,728 bytes  (excludes profiling overheads)
COST CENTRE                    MODULE               %time %alloc
CAF                            GHC.Handle             0.0   10.7
CAF                            Text.Read.Lex          0.0    2.1
CAF                            GHC.Read               0.0    1.2
square                         Main                   0.0    2.8
solve                          Main                   0.0    1.3
show_aVx                       Main                   0.0    3.7
readsPrec_aYF                  Main                   0.0   60.6
main                           Main                   0.0    9.6
genNums                        Main                   0.0    5.0
cell                           Main                   0.0    1.2
                       individual    inherited
COST CENTRE              MODULE
      no.    entries  %time %alloc   %time %alloc
MAIN                     MAIN
        1           0   0.0    0.3     0.0  100.0
 main                    Main
      186           1   0.0    9.6     0.0   85.6
 show_aVx               Main
      196           2   0.0    3.7     0.0    3.7
  cell                  Main
      197          16   0.0    0.0     0.0    0.0
 solve                  Main
      188           5   0.0    1.3     0.0   11.8
  genNums               Main
      189           8   0.0    5.0     0.0   10.4
   square               Main
      194          88   0.0    2.8     0.0    3.2
    cell                Main
      195          16   0.0    0.4     0.0    0.4
   col                  Main
      192           4   0.0    0.7     0.0    1.1
    cell                Main
      193          16   0.0    0.4     0.0    0.4
   row                  Main
      190           4   0.0    0.7     0.0    1.1
    cell                Main
      191          16   0.0    0.4     0.0    0.4
 readsPrec_aYF          Main
      187           3   0.0   60.6     0.0   60.6
 CAF                     GHC.Read
      151           1   0.0    1.2     0.0    1.2
 CAF                     Text.Read.Lex
      144           8   0.0    2.1     0.0    2.1
 CAF                     GHC.Handle
      128           4   0.0   10.7     0.0   10.7
 CAF                     GHC.Conc
      127           1   0.0    0.0     0.0    0.0
Does the column 'entries' represent the number of times the function
was called?
Number of times it was 'entered', not quite the same as the number of times it was called.
I think (Warning: speculation ahead, I don't *know* how the profiles are generated) it's 
thus:
Say you call a function returning a list. One call, first entry. It finds the beginning of 
the list, the first k elements and hands them to the caller. Caller processes these, asks 
"can I have more, or was that it?". Same call, second entry: f looks for more, finds the 
next m elements, hands them to caller. Caller processes. Repeat until whatever happens 
first, caller doesn't ask whether there's more or callee finds there's nothing more (or 
hits bottom).
...
If so, I don't understand how the 'square' function could
be called 88 times when it's caller is only called 8 times. Same thing
with 'genNums' (called 8 times, and solve called 5 times)
What am I missing here?
Patrick

Re: [Haskell-cafe] Interpreting profiling results

Daniel Fischer