haskell in online contests

Ive just started learning haskell pretty recently and Ive been trying to solve some online contest problems as part of this exercise. However, Ive been having almost no success. For various reasons my answers almost always are too slow. I recently stumbled across this link which was quite useful http://www.haskell.org/haskellwiki/SPOJ. This helped me speed up some of my programs where input was slowing me down. But being a noob, to a large extent I don't even know why my programs are slow sometimes or how to tell what makes them slow. I've been attempting problems from www.codechef.com (which uses SPOJ) in actuality. Because I have an admin account I can actually compare my solution against others there (which are almost always in C/C++ or Java) to try and figure out if Im missing a trick. Recently the problem I picked up was http://www.codechef.com/problems/DDILEMMA/ and I worked through solutions that just don't seem to be fast enough. I looked at successful submissions in C++ and JAVA which seem to do mostly what I'm doing (ofcourse there are differences because those are imperative languages and I might be misunderstanding things.). I've got my program, test input that I generated, cost center analysis all up on this page. http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=5120#a5137 I've been getting some significant help from the #haskell channel but unfortunately this hasn't helped me break the barrier I need to. So I was wondering if someone would be kind enough to help me understand the profiler output and help me understand how to improve performance in cases like this thanks

Hello vishnu, Friday, November 27, 2009, 10:41:37 PM, you wrote: it's just false assumption that you should got speed comparable to other languages. haskell is lazy and ghc has much less mature compiler
Ive just started learning haskell pretty recently and Ive been trying to solve some online contest problems as part of this exercise. However, Ive been having almost no success. For various reasons my answers almost always are too slow. I recently stumbled across this link which was quite useful
http://www.haskell.org/haskellwiki/SPOJ. This helped me speed up some of my programs where input was slowing me down.
But being a noob, to a large extent I don't even know why my programs are slow sometimes or how to tell what makes them slow. I've been attempting problems from www.codechef.com (which uses SPOJ) in actuality. Because I have an admin account I can actually compare my solution against others there (which are almost always in C/C++ or Java) to try and figure out if Im missing a trick. Recently the problem I picked up was http://www.codechef.com/problems/DDILEMMA/ and I worked through solutions that just don't seem to be fast enough. I looked at successful submissions in C++ and JAVA which seem to do mostly what I'm doing (ofcourse there are differences because those are imperative languages and I might be misunderstanding things.). I've got my program, test input that I generated, cost center analysis all up on this page. http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=5120#a5137
I've been getting some significant help from the #haskell channel but unfortunately this hasn't helped me break the barrier I need to. So I was wondering if someone would be kind enough to help me understand the profiler output and help me understand how to improve performance in cases like this
thanks
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

bulat.ziganshin:
Hello vishnu,
Friday, November 27, 2009, 10:41:37 PM, you wrote:
it's just false assumption that you should got speed comparable to other languages. haskell is lazy and ghc has much less mature compiler
"comparable to other languages" eh? That seems a little too broad to be meaningful, Bulat, as we all know. -- Don

Hello Don, Friday, November 27, 2009, 11:08:44 PM, you wrote:
it's just false assumption that you should got speed comparable to other languages. haskell is lazy and ghc has much less mature compiler
"comparable to other languages" eh?
That seems a little too broad to be meaningful, Bulat, as we all know.
he compared to c++/java! -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi Bulat
hmm ok I understand the issue of compiler maturity. But I thought
lazyness was meant to be a bonus? Or is it that if you really try to squeeze
performance it becomes more of a hindrance?
On Sat, Nov 28, 2009 at 1:47 AM, Bulat Ziganshin
Hello Don,
Friday, November 27, 2009, 11:08:44 PM, you wrote:
it's just false assumption that you should got speed comparable to other languages. haskell is lazy and ghc has much less mature compiler
"comparable to other languages" eh?
That seems a little too broad to be meaningful, Bulat, as we all know.
he compared to c++/java!
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Nov 27, 2009, at 23:18 , vishnu wrote:
hmm ok I understand the issue of compiler maturity. But I thought lazyness was meant to be a bonus? Or is it that if you really try to squeeze performance it becomes more of a hindrance?
If you're trying to eke out every last bit of performance, you need precise control over laziness vs. strictness, as each is an advantage in different contexts. Ideally ghc's strictness analyzer would do it for you, but it isn't quite good enough yet, so you can end up writing ugly code to get that precise control. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Fri, Nov 27, 2009 at 9:18 PM, vishnu
Hi Bulat hmm ok I understand the issue of compiler maturity. But I thought lazyness was meant to be a bonus? Or is it that if you really try to squeeze performance it becomes more of a hindrance?
One of my favorite quotes by Heinrich Apfelmus, from http://apfelmus.nfshost.com/quicksearch.html Well, it's highly unlikely that algorithms get faster by introducing laziness. I mean, lazy evaluation means to evaluate only those things that are really needed and any good algorithm will be formulated in a way such that the unnecessary things have already been stripped off. But laziness allows to simplify and compose algorithms. Sometimes, seemingly different algorithms turn out to be two sides of the same coin when formulated with lazy evaluation. Isn't it great that finding the k-th minimum is not only an adaption of quicksort but can readily be obtained from it by composing it with (!! k)? Luke
On Sat, Nov 28, 2009 at 1:47 AM, Bulat Ziganshin
wrote: Hello Don,
Friday, November 27, 2009, 11:08:44 PM, you wrote:
it's just false assumption that you should got speed comparable to other languages. haskell is lazy and ghc has much less mature compiler
"comparable to other languages" eh?
That seems a little too broad to be meaningful, Bulat, as we all know.
he compared to c++/java!
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am Freitag 27 November 2009 20:41:37 schrieb vishnu:
Ive just started learning haskell pretty recently and Ive been trying to solve some online contest problems as part of this exercise. However, Ive been having almost no success. For various reasons my answers almost always are too slow. I recently stumbled across this link which was quite useful http://www.haskell.org/haskellwiki/SPOJ. This helped me speed up some of my programs where input was slowing me down.
But being a noob, to a large extent I don't even know why my programs are slow sometimes or how to tell what makes them slow. I've been attempting problems from www.codechef.com (which uses SPOJ) in actuality. Because I have an admin account I can actually compare my solution against others there (which are almost always in C/C++ or Java) to try and figure out if Im missing a trick. Recently the problem I picked up was http://www.codechef.com/problems/DDILEMMA/ and I worked through solutions that just don't seem to be fast enough. I looked at successful submissions in C++ and JAVA which seem to do mostly what I'm doing (ofcourse there are differences because those are imperative languages and I might be misunderstanding things.). I've got my program, test input that I generated, cost center analysis all up on this page. http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=5120#a5137
I've been getting some significant help from the #haskell channel but unfortunately this hasn't helped me break the barrier I need to. So I was wondering if someone would be kind enough to help me understand the profiler output and help me understand how to improve performance in cases like this
Bad news first. a) According to codechef, you must also consider digits. b) Your distance function is wrong. With idx i j = (i+1)*(j+1) - 1, you write to the same position several times, resulting in garbage. You should use idx i j = i*(n+1) + j. Unfortunately, that slows things down. Now some good news. a) Since the substitution cost for a pair of letters doesn't depend on the strings, you can make a universal substitution-cost matrix (UArray (Char,Char) Int) and read the cost off that. Doesn't work wonders, but speeds things up a little. b) If the lengths of the two strings differs by more than 2, the Levenshtein distance is at least 3, so you needn't calculate. This was probably your intention, but laziness doesn't quite work the way you thought (if I interpreted your intentions correctly). With distance orig new = memf m n where m = snd $ bounds orig n = snd $ bounds new ... , if |m - n| > 2, the thunks for the array entries must still be written - although most needn't be evaluated in this case, that still takes a lot of time. Make it distance orig new = f m n and no thunks need be written at all in this case. Cuts down running time by nearly half :) I think you could speed it up significantly by calculating the distance more lazily. The profiling output is pretty straightforward. You have two functions that take up more or less all the time, one is substitutionCost, the other distance. The former is comparatively benign, the letterHash should be calculated only once and kept alive through the entire run, but you use a Map.lookup and `elem` (plus a branch); with 26 letters, a lookup takes on average about 4 comparisons, then in general two comparisons for `elem`. An array-lookup will be much faster. The latter uses really a lot of time. As said before, a large part of it is because you're not lazy enough. Still, it is a complicated calculation, so it remains a time-consuming task. For more detailed information which parts of it take the most time, add further cost centres ({-# SCC "thing" #-} annotations). Be aware however, that profiling often rules out optimisations and thus changes the run-time behaviour of your code.
thanks

Am Samstag 28 November 2009 02:04:31 schrieb Daniel Fischer:
Make it
distance orig new = f m n
and no thunks need be written at all in this case. Cuts down running time by nearly half :)
I think you could speed it up significantly by calculating the distance more lazily.
Yup :D Timings here: Your last ByteString code: 128s That with corrected index calculation: 172s Correct indices and distance orig new = f m n instead of memf m n: 99s That and an array for the substitution cost: 64s String IO, substitution cost array and a suitably lazy break-early distance function: 1.7s That and a lazier surcharge function: 1.5s The same with ByteString IO: 1.4s Yes, I'd say that qualifies as a significant speedup.

Hi Daniel Wow that's fantastic. Could you explain those further optimisations a bit more please? Especially the whole "more lazyness" thing. Timings here: Your last ByteString code: 128s That with corrected index calculation: 172s Correct indices and distance orig new = f m n instead of memf m n: 99s That and an array for the substitution cost: 64s String IO, substitution cost array and a suitably lazy break-early distance function: 1.7s That and a lazier surcharge function: 1.5s The same with ByteString IO: 1.4s Yes, I'd say that qualifies as a significant speedup. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

this is where I've gotten to.
http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=5120#a5120
strangely enough Ive gotten no speedup at all from the substitution cost
UArray (though I had to make it Int, Int to deal with digits.). But still I
wonder if there's something else I missed. Im really curious what lazyness
you used to go from 60 to 1.6? I always thought lazyness was automatic and
seq made strictness possible.
thanks
Vishnu
On Sat, Nov 28, 2009 at 7:41 AM, Daniel Fischer
Am Samstag 28 November 2009 02:04:31 schrieb Daniel Fischer:
Make it
distance orig new = f m n
and no thunks need be written at all in this case. Cuts down running time by nearly half :)
I think you could speed it up significantly by calculating the distance more lazily.
Yup :D
Timings here: Your last ByteString code: 128s That with corrected index calculation: 172s Correct indices and distance orig new = f m n instead of memf m n: 99s That and an array for the substitution cost: 64s
String IO, substitution cost array and a suitably lazy break-early distance function: 1.7s That and a lazier surcharge function: 1.5s The same with ByteString IO: 1.4s
Yes, I'd say that qualifies as a significant speedup. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

vishnu wrote:
I always thought lazyness was automatic and seq made strictness possible.
Laziness is the default, but that doesn't mean it's everywhere. For example, every time you do pattern matching you enforce strictness: foo (x:xs) = 5 ==> foo _|_ == _|_ However, there's also a way to make patterns lazy: foo ~(x:xs) = 5 ==> foo _|_ == 5 The difference here is in how they're compiled/desugared, something like this: foo (x:xs) = 5 == foo = \xs' -> case xs' of (x:xs) -> 5 foo ~(x:xs) = 5 == foo = \xs' -> let (x:xs) = xs' in 5 == foo = \xs' -> 5 So writing the "same" function in different ways can enforce different strictness behavior. There are also more substantial algorithmic uses of laziness. One example is writing functions in a way that facilitates list fusion or on-line consumption of lists, vs writing functions in a way that forces more than one cell of the list at a time. Deforestation and garbage collection are not laziness, but appropriate uses of laziness can allow you to leverage them to improve the performance of your code. -- Live well, ~wren

Am Samstag 28 November 2009 21:21:20 schrieb vishnu:
this is where I've gotten to. http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=5120#a5120 strangely enough Ive gotten no speedup at all from the substitution cost UArray (though I had to make it Int, Int to deal with digits.).
Converting the characters with letterValue takes time. I'm a little surprised that it takes so much time, though. I would have expected it to still be faster than Map. If you make subArray :: UArray (Char,Char) Int subArray = array (('0','0'),('z'z')) ... you avoid the conversion at the price of a larger array. It's still small enough to have the entire computation data in the cache, so it should be faster. However, the Chars are converted to Ints for array-indexing anyway (I think Char is internally represented as a machine integer [wrapped in a constructor], so this is basically a no-op, even if not, it's going to be much faster than letterValue), so why not avoid the conversions (except once on reading) completely and work with Ints? Change all (UArray Int Char) to (UArray Int Int) and let getArray :: BS.ByteString -> UArray Int Int getArray xs = listArray (1, fromIntegral (BS.length xs)) (map letterValue $ BS.unpack xs) replace substitutionCost (orig ! i) (new ! j) with subArray!(orig!i,new!j) and remove substitutionCost (optional), that's all you need to change in your code - except: please fix the typo -- calculating the Leveishtein distance as described here ...................^^^^^^^^^^^^^^^^^^ :)
But still I wonder if there's something else I missed. Im really curious what lazyness you used to go from 60 to 1.6? I always thought lazyness was automatic and seq made strictness possible.
What you need is a sufficiently lazy *algorithm* to compute (min 3 $ distance orig new). For top speed, you must implement that algorithm sufficiently strictly ;) You might want to read carefully the "Possible improvements" section on WP to get an idea. I'll try to explain without giving too much away to respect the spirit of the codechef challenge. The Levenshtein algorithm for computing the cost of the cheapest editing sequence(s) transforming start (length m) into target (length n) computes the lowest costs for transforming initial sequences of start (length i) to initial sequences of target (length j), i ranging from 0 to m, j from 0 to n, altogether (m+1)*(n+1) costs. The costs for i == 0 or j == 0 are easily determined and if you calculate the costs in an appropriate order, calculating each cost is cheap. We are only interested in whether the cost (distance) is 0, 1, 2 or larger than 2. So whenever we stray more than two steps from the diagonal, we can stop. You approximate that behaviour by writing the value 3 to all cells far enough off the diagonal. But you're still writing (m+1)*(n+1) values/thunks to the array. Since the actual calculation of the costs is cheap, you don't win very much (cuts down execution time by a little more than half - not bad, but much more is possible). Also, you're always walking down the entire diagonal, even if one can see much sooner that the cost is larger than 2. Consider thisends -> herestop The last letters differ, so the cost is one of a) 2+cost (thisend -> heresto) -- substitution (s,p) b) 1+cost (thisends -> heresto) -- insert p c) 1+cost (thisend -> herestop) -- delete s a) last letters differ, another branch adding at least 1 to the cost, so after the second step we know that route leads to a total larger than 2 b) and c) need three steps to ascertain that the total cost exceeds 2 Now for long strings with large Levenshtein distance, this is typical (occasionally you'll encounter identical letters, but that doesn't take much time since it doesn't involve a branching), after three levels of branching, you know the cost exceeds 2, no need to go further. So a properly lazy algorithm stops processing as soon as it's certain that the distance is larger than 2. One way to do it is to calculate the distance using lazy Peano numbers and checking whether it's larger than 2: --------------------------------------------------------------------- data Nat = Zero | Succ Nat n2i :: Nat -> Int n2i (Succ n) = 1 + n2i n n2i _ = 0 i2n :: Int -> Nat i2n 0 = Zero i2n n = Succ (i2n (n-1)) minN :: Nat -> Nat -> Nat minN (Succ m) (Succ n) = Succ (minN m n) minN _ _ = Zero ldistance :: UArray Int Char -> UArray Int Char -> Nat ldistance orig new = minN (Succ (Succ (Succ Zero))) $ go m n where m = snd $ bounds orig n = snd $ bounds new go i j | i == 0 = i2n j | j == 0 = i2n i | a == b = go (i-1) (j-1) | otherwise = let h = costArray!(a,b) x = case h of 1 -> Succ (go (i-1) (j-1)) 2 -> Succ (Succ (go (i-1) (j-1))) y = Succ (go i (j-1)) z = Succ (go (i-1) j) in minN x (minN y z) where a = orig!i b = new!j distance :: UArray Int Char -> UArray Int Char -> Int distance orig new = n2i $ ldistance orig new --------------------------------------------------------------------- Performance still sucks (20s), partly because Nat is slow, partly because I intentionally pessimised (5s without that pessimisation), but it is a sufficiently lazy algorithm (almost, there's still a way to stop even earlier). Now find an efficient way to break early (hint: don't use a lazy datatype, use Int).
thanks Vishnu

wow I just woke up to see this :). Im impressed at the speed of the response, thanks Daniel Bad news first.
a) According to codechef, you must also consider digits.
you're right, I totally missed this. Thanks :) b) Your distance function is wrong.
With idx i j = (i+1)*(j+1) - 1, you write to the same position several times, resulting in garbage. You should use idx i j = i*(n+1) + j. Unfortunately, that slows things down.
wow that's just such an incredibly "doh" moment for me. I had initially written the array as being indexed by a tuple (i,j) and later to speed things up I moved to a single dimensional array without realising I had done something so dumb.
Now some good news. a) Since the substitution cost for a pair of letters doesn't depend on the strings, you can make a universal substitution-cost matrix (UArray (Char,Char) Int) and read the cost off that. Doesn't work wonders, but speeds things up a little.
yes this makes a lot of sense. I had initially kept the letterHash outside. I moved it inside thinking to encapsulate it into substition cost because it didnt change the time much. But making the entire substitution cost matrix fixed makes a lot of sense
b) If the lengths of the two strings differs by more than 2, the Levenshtein distance is at least 3, so you needn't calculate. This was probably your intention, but laziness doesn't quite work the way you thought (if I interpreted your intentions correctly). With
distance orig new = memf m n where m = snd $ bounds orig n = snd $ bounds new ...
, if |m - n| > 2, the thunks for the array entries must still be written - although most needn't be evaluated in this case, that still takes a lot of time. Make it
distance orig new = f m n
and no thunks need be written at all in this case. Cuts down running time by nearly half :)
wow yes. I was too obsessed with how I had seen the fibonacci example of memoisation that I didnt think of this. Also I think I still dont pay enough attention to thunks and the time they take. So the problem here is when I calculate memf an entire array of thunks is written down and then the last one is being evaluated. So I could avoid the array creation. Makes sense to me :)
I think you could speed it up significantly by calculating the distance more lazily.
I'd love to hear your thoughts on how that might happen? I thought the whole thing was inherently lazy?
The profiling output is pretty straightforward. You have two functions that take up more or less all the time, one is substitutionCost, the other distance.
The former is comparatively benign, the letterHash should be calculated only once and kept alive through the entire run, but you use a Map.lookup and `elem` (plus a branch); with 26 letters, a lookup takes on average about 4 comparisons, then in general two comparisons for `elem`. An array-lookup will be much faster.
aha right.
The latter uses really a lot of time. As said before, a large part of it is because you're not lazy enough. Still, it is a complicated calculation, so it remains a time-consuming task. For more detailed information which parts of it take the most time, add further cost centres ({-# SCC "thing" #-} annotations). Be aware however, that profiling often rules out optimisations and thus changes the run-time behaviour of your code.
thanks
thanks a bunch. Im gonna make those changes here now :). Might take me a
while though cause my haskell code writing speed is still a bit slow =p.
participants (7)
-
Brandon S. Allbery KF8NH
-
Bulat Ziganshin
-
Daniel Fischer
-
Don Stewart
-
Luke Palmer
-
vishnu
-
wren ng thornton