Interpreting profiling results

Hi, This question didn't get any replies on the beginners list, I thought I'd try it here... I've written (and improved using other solutions I've found on the net) a simple sudoku solver which I'm trying to profile. Here is the code: import Array import List (transpose, nub, (\\)) import Data.List data Sudoku = Sudoku { unit :: Int, cells :: Array (Int, Int) Int, holes :: [(Int, Int)] } cell :: Sudoku -> (Int, Int) -> Int cell s i = (cells s) ! i instance Read Sudoku where readsPrec _ s = [(Sudoku unit cells holes, "")] where unit = length . words . head . lines $ s cells = listArray ((1, 1), (unit, unit)) (map read . words $ s) holes = [ c | c <- indices cells, (cells ! c) == 0] instance Show Sudoku where show s = unlines [unwords [show $ cell s (x,y) | x <- [1 .. unit s]] | y <- [1 .. unit s]] genNums :: Sudoku -> (Int, Int) -> [Int] genNums s c@(i,j) = ([1 .. u] \\) . nub $ used where used = (row s u i j) ++ (col s u i j) ++ (square s sq u i j) u = unit s sq = truncate . sqrt . fromIntegral $ u row s u i j = [cell s (i, y) | y <- [1 .. u]] col s u i j = [cell s (x, j) | x <- [1 .. u]] square s sq u i j = [cell s (x, y) | y <- [1 .. u], x <- [1 .. u], f x i, f y j] where f a b = div (a-1) sq == div (b-1) sq solve :: Sudoku -> [Sudoku] solve s = case holes s of [] -> [s] (h:hs) -> do n <- genNums s h let s' = Sudoku (unit s) ((cells s) // [(h, n)]) hs solve s' main = print . head . solve . read =<< getContents When I compile as such: $ ghc -O2 --make Sudoku.hs -prof -auto-all -caf-all -fforce-recomp and run it on the following puzzle: 0 2 3 4 3 4 1 0 2 1 4 0 0 3 2 1 I get the following profiling report: Fri Jan 1 10:34 2010 Time and Allocation Profiling Report (Final) Sudoku +RTS -p -RTS total time = 0.00 secs (0 ticks @ 20 ms) total alloc = 165,728 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc CAF GHC.Handle 0.0 10.7 CAF Text.Read.Lex 0.0 2.1 CAF GHC.Read 0.0 1.2 square Main 0.0 2.8 solve Main 0.0 1.3 show_aVx Main 0.0 3.7 readsPrec_aYF Main 0.0 60.6 main Main 0.0 9.6 genNums Main 0.0 5.0 cell Main 0.0 1.2 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 1 0 0.0 0.3 0.0 100.0 main Main 186 1 0.0 9.6 0.0 85.6 show_aVx Main 196 2 0.0 3.7 0.0 3.7 cell Main 197 16 0.0 0.0 0.0 0.0 solve Main 188 5 0.0 1.3 0.0 11.8 genNums Main 189 8 0.0 5.0 0.0 10.4 square Main 194 88 0.0 2.8 0.0 3.2 cell Main 195 16 0.0 0.4 0.0 0.4 col Main 192 4 0.0 0.7 0.0 1.1 cell Main 193 16 0.0 0.4 0.0 0.4 row Main 190 4 0.0 0.7 0.0 1.1 cell Main 191 16 0.0 0.4 0.0 0.4 readsPrec_aYF Main 187 3 0.0 60.6 0.0 60.6 CAF GHC.Read 151 1 0.0 1.2 0.0 1.2 CAF Text.Read.Lex 144 8 0.0 2.1 0.0 2.1 CAF GHC.Handle 128 4 0.0 10.7 0.0 10.7 CAF GHC.Conc 127 1 0.0 0.0 0.0 0.0 Does the column 'entries' represent the number of times the function was called? If so, I don't understand how the 'square' function could be called 88 times when it's caller is only called 8 times. Same thing with 'genNums' (called 8 times, and solve called 5 times) What am I missing here? Patrick -- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Am Montag 04 Januar 2010 02:17:06 schrieb Patrick LeBoutillier:
Hi,
This question didn't get any replies on the beginners list, I thought I'd try it here...
Sorry, been occupied with other things. I already took a look, but hadn't anything conclusive enough to reply yet.
I've written (and improved using other solutions I've found on the net) a simple sudoku solver which I'm trying to profile. Here is the code:
import Array
Better import Data.Array.Unboxed *much* faster
import List (transpose, nub, (\\)) import Data.List
data Sudoku = Sudoku { unit :: Int, cells :: Array (Int, Int) Int,
cells :: UArray (Int,Int) Int
holes :: [(Int, Int)] }
cell :: Sudoku -> (Int, Int) -> Int cell s i = (cells s) ! i
instance Read Sudoku where readsPrec _ s = [(Sudoku unit cells holes, "")] where unit = length . words . head . lines $ s cells = listArray ((1, 1), (unit, unit)) (map read . words $ s) holes = [ c | c <- indices cells, (cells ! c) == 0]
instance Show Sudoku where show s = unlines [unwords [show $ cell s (x,y) | x <- [1 .. unit s]]
| y <- [1 .. unit s]]
genNums :: Sudoku -> (Int, Int) -> [Int] genNums s c@(i,j) = ([1 .. u] \\) . nub $ used where
nub isn't nice. It's quadratic in the length of the list. Use e.g. map head . group . sort or Data.[Int]Set.toList . Data.[Int]Set.fromList if the type is in Ord (and you don't need the distinct elements in the order they come in). That gives an O(n*log n) nub with a sorted result. And (\\) isn't particularly fast either (O(m*n), where m and n are the lengths of the lists). If you use one of the above instead of nub, you can use the O(min m n) 'minus' for sorted lists: xxs@(x:xs) `minus` yys@(y:ys) | x < y = x : xs `minus` yys | x == y = xs `minus` ys | otherwise = xxs `minus` ys xs `minus` _ = xs Here, you can do better: genNums s c@(i,j) = nums where nums = [n | n <- [1 .. u], arr!n] arr :: [U]Array Int Bool arr = accumArray (\_ _ -> False) True (0,u) used
used = (row s u i j) ++ (col s u i j) ++ (square s sq u i j) u = unit s
Not good to calculate sq here. You'll use it many times, calculate once and store it in s.
sq = truncate . sqrt . fromIntegral $ u
row s u i j = [cell s (i, y) | y <- [1 .. u]]
col s u i j = [cell s (x, j) | x <- [1 .. u]]
square s sq u i j = [cell s (x, y) | y <- [1 .. u], x <- [1 .. u], f x i, f y j] where f a b = div (a-1) sq == div (b-1) sq
Test for f y j before you generate x to skip early. square s sq u i j = [cell s (ni+x,nj+y) | x <- [1 .. sq], y <- [1 .. sq]] where qi = (i-1) `div` sq qj = (j-1) `div` sq ni = qi*sq nj = qj*sq
solve :: Sudoku -> [Sudoku] solve s = case holes s of [] -> [s] (h:hs) -> do n <- genNums s h let s' = Sudoku (unit s) ((cells s) // [(h, n)]) hs solve s'
main = print . head . solve . read =<< getContents
When I compile as such:
$ ghc -O2 --make Sudoku.hs -prof -auto-all -caf-all -fforce-recomp
and run it on the following puzzle:
0 2 3 4 3 4 1 0 2 1 4 0 0 3 2 1
I get the following profiling report:
Fri Jan 1 10:34 2010 Time and Allocation Profiling Report (Final)
Sudoku +RTS -p -RTS
total time = 0.00 secs (0 ticks @ 20 ms)
That means the report is basically useless. Not entirely, because the allocation figures may already contain useful information. Run on a 9x9 puzzle (a not too hard one, but not trivial either). Also, run the profiling with -P instead of -p, you'll get more info about time and allocation then.
total alloc = 165,728 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
CAF GHC.Handle 0.0 10.7 CAF Text.Read.Lex 0.0 2.1 CAF GHC.Read 0.0 1.2 square Main 0.0 2.8 solve Main 0.0 1.3 show_aVx Main 0.0 3.7 readsPrec_aYF Main 0.0 60.6 main Main 0.0 9.6 genNums Main 0.0 5.0 cell Main 0.0 1.2
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 1 0 0.0 0.3 0.0 100.0 main Main 186 1 0.0 9.6 0.0 85.6 show_aVx Main 196 2 0.0 3.7 0.0 3.7 cell Main 197 16 0.0 0.0 0.0 0.0 solve Main 188 5 0.0 1.3 0.0 11.8 genNums Main 189 8 0.0 5.0 0.0 10.4 square Main 194 88 0.0 2.8 0.0 3.2 cell Main 195 16 0.0 0.4 0.0 0.4 col Main 192 4 0.0 0.7 0.0 1.1 cell Main 193 16 0.0 0.4 0.0 0.4 row Main 190 4 0.0 0.7 0.0 1.1 cell Main 191 16 0.0 0.4 0.0 0.4 readsPrec_aYF Main 187 3 0.0 60.6 0.0 60.6 CAF GHC.Read 151 1 0.0 1.2 0.0 1.2 CAF Text.Read.Lex 144 8 0.0 2.1 0.0 2.1 CAF GHC.Handle 128 4 0.0 10.7 0.0 10.7 CAF GHC.Conc 127 1 0.0 0.0 0.0 0.0
Does the column 'entries' represent the number of times the function was called?
Number of times it was 'entered', not quite the same as the number of times it was called. I think (Warning: speculation ahead, I don't *know* how the profiles are generated) it's thus: Say you call a function returning a list. One call, first entry. It finds the beginning of the list, the first k elements and hands them to the caller. Caller processes these, asks "can I have more, or was that it?". Same call, second entry: f looks for more, finds the next m elements, hands them to caller. Caller processes. Repeat until whatever happens first, caller doesn't ask whether there's more or callee finds there's nothing more (or hits bottom).
If so, I don't understand how the 'square' function could be called 88 times when it's caller is only called 8 times. Same thing with 'genNums' (called 8 times, and solve called 5 times)
What am I missing here?
Patrick

On Mon, Jan 4, 2010 at 10:05 AM, Daniel Fischer
Am Montag 04 Januar 2010 02:17:06 schrieb Patrick LeBoutillier:
Hi,
This question didn't get any replies on the beginners list, I thought
I'd try it here...
Sorry, been occupied with other things. I already took a look, but hadn't anything conclusive enough to reply yet.
No sweat... I didn't mean to be pushy :) Thanks a lot for all the pointers, they have speeded up my code a lot. Patrick
I've written (and improved using other solutions I've found on the
net) a simple sudoku solver which I'm trying to profile. Here is the
code:
import Array
Better
import Data.Array.Unboxed
*much* faster
import List (transpose, nub, (\\))
import Data.List
data Sudoku = Sudoku { unit :: Int, cells :: Array (Int, Int) Int,
cells :: UArray (Int,Int) Int
holes :: [(Int, Int)] }
cell :: Sudoku -> (Int, Int) -> Int
cell s i = (cells s) ! i
instance Read Sudoku where
readsPrec _ s = [(Sudoku unit cells holes, "")]
where unit = length . words . head . lines $ s
cells = listArray ((1, 1), (unit, unit)) (map read . words $ s)
holes = [ c | c <- indices cells, (cells ! c) == 0]
instance Show Sudoku where
show s = unlines [unwords [show $ cell s (x,y) | x <- [1 .. unit s]]
| y <- [1 .. unit s]]
genNums :: Sudoku -> (Int, Int) -> [Int]
genNums s c@(i,j) = ([1 .. u] \\) . nub $ used
where
nub isn't nice. It's quadratic in the length of the list. Use e.g.
map head . group . sort
or
Data.[Int]Set.toList . Data.[Int]Set.fromList
if the type is in Ord (and you don't need the distinct elements in the order they come in). That gives an O(n*log n) nub with a sorted result.
And (\\) isn't particularly fast either (O(m*n), where m and n are the lengths of the lists). If you use one of the above instead of nub, you can use the O(min m n) 'minus' for sorted lists:
xxs@(x:xs) `minus` yys@(y:ys)
| x < y = x : xs `minus` yys
| x == y = xs `minus` ys
| otherwise = xxs `minus` ys
xs `minus` _ = xs
Here, you can do better:
genNums s c@(i,j) = nums
where
nums = [n | n <- [1 .. u], arr!n]
arr :: [U]Array Int Bool
arr = accumArray (\_ _ -> False) True (0,u) used
used = (row s u i j) ++ (col s u i j) ++ (square s sq u i j)
u = unit s
Not good to calculate sq here. You'll use it many times, calculate once and store it in s.
sq = truncate . sqrt . fromIntegral $ u
row s u i j = [cell s (i, y) | y <- [1 .. u]]
col s u i j = [cell s (x, j) | x <- [1 .. u]]
square s sq u i j = [cell s (x, y) | y <- [1 .. u], x <- [1 .. u], f x i, f
y j] where f a b = div (a-1) sq == div (b-1) sq
Test for f y j before you generate x to skip early.
square s sq u i j = [cell s (ni+x,nj+y) | x <- [1 .. sq], y <- [1 .. sq]]
where
qi = (i-1) `div` sq
qj = (j-1) `div` sq
ni = qi*sq
nj = qj*sq
solve :: Sudoku -> [Sudoku]
solve s =
case holes s of
[] -> [s]
(h:hs) -> do
n <- genNums s h
let s' = Sudoku (unit s) ((cells s) // [(h, n)]) hs
solve s'
main = print . head . solve . read =<< getContents
When I compile as such:
$ ghc -O2 --make Sudoku.hs -prof -auto-all -caf-all -fforce-recomp
and run it on the following puzzle:
0 2 3 4
3 4 1 0
2 1 4 0
0 3 2 1
I get the following profiling report:
Fri Jan 1 10:34 2010 Time and Allocation Profiling Report (Final)
Sudoku +RTS -p -RTS
total time = 0.00 secs (0 ticks @ 20 ms)
That means the report is basically useless. Not entirely, because the allocation figures may already contain useful information. Run on a 9x9 puzzle (a not too hard one, but not trivial either).
Also, run the profiling with -P instead of -p, you'll get more info about time and allocation then.
total alloc = 165,728 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
CAF GHC.Handle 0.0 10.7
CAF Text.Read.Lex 0.0 2.1
CAF GHC.Read 0.0 1.2
square Main 0.0 2.8
solve Main 0.0 1.3
show_aVx Main 0.0 3.7
readsPrec_aYF Main 0.0 60.6
main Main 0.0 9.6
genNums Main 0.0 5.0
cell Main 0.0 1.2
individual inherited
COST CENTRE MODULE
no. entries %time %alloc %time %alloc
MAIN MAIN
1 0 0.0 0.3 0.0 100.0
main Main
186 1 0.0 9.6 0.0 85.6
show_aVx Main
196 2 0.0 3.7 0.0 3.7
cell Main
197 16 0.0 0.0 0.0 0.0
solve Main
188 5 0.0 1.3 0.0 11.8
genNums Main
189 8 0.0 5.0 0.0 10.4
square Main
194 88 0.0 2.8 0.0 3.2
cell Main
195 16 0.0 0.4 0.0 0.4
col Main
192 4 0.0 0.7 0.0 1.1
cell Main
193 16 0.0 0.4 0.0 0.4
row Main
190 4 0.0 0.7 0.0 1.1
cell Main
191 16 0.0 0.4 0.0 0.4
readsPrec_aYF Main
187 3 0.0 60.6 0.0 60.6
CAF GHC.Read
151 1 0.0 1.2 0.0 1.2
CAF Text.Read.Lex
144 8 0.0 2.1 0.0 2.1
CAF GHC.Handle
128 4 0.0 10.7 0.0 10.7
CAF GHC.Conc
127 1 0.0 0.0 0.0 0.0
Does the column 'entries' represent the number of times the function
was called?
Number of times it was 'entered', not quite the same as the number of times it was called.
I think (Warning: speculation ahead, I don't *know* how the profiles are generated) it's thus:
Say you call a function returning a list. One call, first entry. It finds the beginning of the list, the first k elements and hands them to the caller. Caller processes these, asks "can I have more, or was that it?". Same call, second entry: f looks for more, finds the next m elements, hands them to caller. Caller processes. Repeat until whatever happens first, caller doesn't ask whether there's more or callee finds there's nothing more (or hits bottom).
If so, I don't understand how the 'square' function could
be called 88 times when it's caller is only called 8 times. Same thing
with 'genNums' (called 8 times, and solve called 5 times)
What am I missing here?
Patrick
-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Hi, Daniel Fischer wrote:
total alloc = 165,728 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
CAF GHC.Handle 0.0 10.7 CAF Text.Read.Lex 0.0 2.1 CAF GHC.Read 0.0 1.2 square Main 0.0 2.8 solve Main 0.0 1.3 show_aVx Main 0.0 3.7 readsPrec_aYF Main 0.0 60.6 main Main 0.0 9.6 genNums Main 0.0 5.0 cell Main 0.0 1.2
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 1 0 0.0 0.3 0.0 100.0 main Main 186 1 0.0 9.6 0.0 85.6 show_aVx Main 196 2 0.0 3.7 0.0 3.7 cell Main 197 16 0.0 0.0 0.0 0.0 solve Main 188 5 0.0 1.3 0.0 11.8 genNums Main 189 8 0.0 5.0 0.0 10.4 square Main 194 88 0.0 2.8 0.0 3.2 cell Main 195 16 0.0 0.4 0.0 0.4 col Main 192 4 0.0 0.7 0.0 1.1 cell Main 193 16 0.0 0.4 0.0 0.4 row Main 190 4 0.0 0.7 0.0 1.1 cell Main 191 16 0.0 0.4 0.0 0.4 readsPrec_aYF Main 187 3 0.0 60.6 0.0 60.6 CAF GHC.Read 151 1 0.0 1.2 0.0 1.2 CAF Text.Read.Lex 144 8 0.0 2.1 0.0 2.1 CAF GHC.Handle 128 4 0.0 10.7 0.0 10.7 CAF GHC.Conc 127 1 0.0 0.0 0.0 0.0
Does the column 'entries' represent the number of times the function was called?
Number of times it was 'entered', not quite the same as the number of times it was called. I think (Warning: speculation ahead, I don't *know* how the profiles are generated) it's thus: Say you call a function returning a list. One call, first entry. It finds the beginning of the list, the first k elements and hands them to the caller. Caller processes these, asks "can I have more, or was that it?". Same call, second entry: f looks for more, finds the next m elements, hands them to caller. Caller processes. Repeat until whatever happens first, caller doesn't ask whether there's more or callee finds there's nothing more (or hits bottom).
Warning: speculation ahead, but is based on my knowledge on other profilers. Many profilers work statistically, they interrupt a program at more less random (or equal) intervals and inspect the stack, whick is of course quite difficult in Haskell as far as I understand it. I have interpreted the entries column as an indication for the amount of "profile interrupts" which happened when a function f was on top of the stack, whatever that means in Haskell. The manual of GHC 6.10.4, chapter 5 states:
The actual meaning of the various columns in the output is:
entries
The number of times this particular point in the call graph was entered.
So for me the question remains open, is "entries" a precisely counted value or a statistically determined one? Best regards, Jean

Am Dienstag 16 Februar 2010 15:45:38 schrieb Jean-Marie Gaillourdet:
Warning: speculation ahead, but is based on my knowledge on other profilers.
Many profilers work statistically, they interrupt a program at more less random (or equal) intervals and inspect the stack, whick is of course quite difficult in Haskell as far as I understand it. I have interpreted the entries column as an indication for the amount of "profile interrupts" which happened when a function f was on top of the stack, whatever that means in Haskell.
The manual of GHC 6.10.4, chapter 5 states:
The actual meaning of the various columns in the output is:
entries
The number of times this particular point in the call graph was entered.
So for me the question remains open, is "entries" a precisely counted value or a statistically determined one?
I have one observation that supports "precisely counted", namely, while the time spent in each cost centre (number of ticks) varies between profiling runs of the same code, the number of bytes allocated and the number of entries remain the same. It's far from conclusive, though. Anybody willing to dig into the profiler code? 8-)
Best regards, Jean
Cheers, Daniel

On Tue, Feb 16, 2010 at 3:45 PM, Jean-Marie Gaillourdet wrote: Hi, total alloc = 165,728 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc CAF GHC.Handle 0.0 10.7
CAF Text.Read.Lex 0.0 2.1
CAF GHC.Read 0.0 1.2
square Main 0.0 2.8
solve Main 0.0 1.3
show_aVx Main 0.0 3.7
readsPrec_aYF Main 0.0 60.6
main Main 0.0 9.6
genNums Main 0.0 5.0
cell Main 0.0 1.2 individual inherited
COST CENTRE MODULE
no. entries %time %alloc %time %alloc MAIN MAIN
1 0 0.0 0.3 0.0 100.0
main Main
186 1 0.0 9.6 0.0 85.6
show_aVx Main
196 2 0.0 3.7 0.0 3.7
cell Main
197 16 0.0 0.0 0.0 0.0
solve Main
188 5 0.0 1.3 0.0 11.8
genNums Main
189 8 0.0 5.0 0.0 10.4
square Main
194 88 0.0 2.8 0.0 3.2
cell Main
195 16 0.0 0.4 0.0 0.4
col Main
192 4 0.0 0.7 0.0 1.1
cell Main
193 16 0.0 0.4 0.0 0.4
row Main
190 4 0.0 0.7 0.0 1.1
cell Main
191 16 0.0 0.4 0.0 0.4
readsPrec_aYF Main
187 3 0.0 60.6 0.0 60.6
CAF GHC.Read
151 1 0.0 1.2 0.0 1.2
CAF Text.Read.Lex
144 8 0.0 2.1 0.0 2.1
CAF GHC.Handle
128 4 0.0 10.7 0.0 10.7
CAF GHC.Conc
127 1 0.0 0.0 0.0 0.0 Does the column 'entries' represent the number of times the function
was called? Number of times it was 'entered', not quite the same as the number of
times it was called.
I think (Warning: speculation ahead, I don't *know* how the profiles are
generated) it's
thus:
Say you call a function returning a list. One call, first entry. It finds the list, the first k elements and hands them to the caller. Caller Daniel Fischer wrote:
the beginning of
processes these, asks "can I have more, or was that it?". Same call, second entry: f looks for
more, finds the
next m elements, hands them to caller. Caller processes. Repeat until
whatever happens
first, caller doesn't ask whether there's more or callee finds there's
nothing more (or
hits bottom). Warning: speculation ahead, but is based on my knowledge on other
profilers. Many profilers work statistically, they interrupt a program at more less
random (or equal) intervals and inspect the stack, whick is of course
quite difficult in Haskell as far as I understand it. I have interpreted
the entries column as an indication for the amount of "profile
interrupts" which happened when a function f was on top of the stack,
whatever that means in Haskell. The manual of GHC 6.10.4, chapter 5 states: The actual meaning of the various columns in the output is: entries The number of times this particular point in the call graph was
entered. So for me the question remains open, is "entries" a precisely counted
value or a statistically determined one? I believe it's the latter. I think the RTS uses a periodic SIGALRM to figure
out which function is currently executing and to record that in the profile.
Simon Marlow would know.

So for me the question remains open, is "entries" a precisely counted value or a statistically determined one?
The entry count is precise. It is only the time (and allocation) counts that are determined statistically. (If the entry count _were_ statistically sampled, it would give exactly the same information as the time counts, no more and no less.) Regards, Malcolm
participants (5)
-
Daniel Fischer
-
Jean-Marie Gaillourdet
-
Johan Tibell
-
Malcolm Wallace
-
Patrick LeBoutillier