Ocaml for Haskellers tutorial

As a Haskeller who is forced to switch to Ocaml by the job market (or at least by the job interview market), I could really use a "Ocaml for Haskellers" tutorial. I was wondering if there are others who have been in this situation recently and can share a few pointers, or better yet, write a blog entry pointing to good sources of information and highlighting the pitfalls. For instance, I have heard that Ocaml is only fast when one uses loops instead of folds, but I wonder if this is an overstatement. Of course due to the strict semantics of Ocaml many of the assumptions that come from laziness will no longer be valid, but I wonder how different is idiomatic Ocaml from idiomatic Haskell in real world. Thanks, pepe

Hi Ocaml's a fine alternative to Haskell, you could be 'forced' to use say PHP, now that wouldn't be quite so nice...
For instance, I have heard that Ocaml is only fast when one uses loops instead of folds, but I wonder if this is an overstatement.
Maybe this notion is from the standard library not being optimized for long lists? http://ocaml.janestreet.com/?q=node/71 http://groups.google.com/group/fa.caml/msg/01e80aadf16837d6 There isn't much on performance in the O'Reilly book, but otherwise it is pretty comprehensive: http://caml.inria.fr/pub/docs/oreilly-book/ I suspect the difference between the ML module system vs. typeclasses will be as important as lazy vs. strict. As a style point, Ocaml programmers don't seem too prone to combinator mania - so I think golf is a bit less popular over there. One tip - sometimes you might see Ocaml code with the revised syntax (particularly Gerard Huet's work, which whilst described as 'Pidgin ML' is a subset of the Caml revised syntax) - this can be translated automatically to regular syntax with camlp4. Not knowing this tripped me up for a couple of days at the end of last year. Best wishes Stephen

Stephen Tetley wrote:
I suspect the difference between the ML module system vs. typeclasses will be as important as lazy vs. strict. As a style point, Ocaml programmers don't seem too prone to combinator mania - so I think golf is a bit less popular over there.
Both the lack of laziness and the value restriction can sometimes force combinator definitions to be eta-expanded, which I think makes this style of programming much less attractive. Cheers, Ganesh =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================

forced to switch to Ocaml
That like saying that you broke up with Miss America and started going out with the runner-up. Sucks to be you :) I experimented with Ocaml for a little while and among other things I found that technology stack that comes with GHC (compared to Ocaml) is quite a bit nicer. It is much easier to deploy/install apps and dependencies. And on the whole libraries seem to be a lot more mature on the Haskell side. Lazy/strict, modules/typeclasses aside this made a big difference to me. -deech

Hello, One major thing I haven't seen explicitly mentioned yet in this thread is tail recursion. You have to write tail recursively in OCaml (or any strict language) or you will blow the stack. While tail recursion is often wrong (in terms of efficiency) in Haskell, it is always right in OCaml. -Jeff

I am a little surprised by the "shortcomings" of Haskell mentioned in the
thread.
I was under the impression that Haskell was closest to Nirvana on the
usefulness vs safety graph.
In the paper "Why FP matters" - Laziness is stated as one of the two key
features of FP that allows conceptual modularity! If I understand right,
Laziness is not a first class stuff in OCaml - is that not right?
If I understand correctly - Not allowing side-effects allow for equational
reasoning - which in turn allows you to "reason" about the program better.
If I understand right - OCaml allows side effects right?
Jeff, could you please expand on the tail recursion bit - what do you mean
when you say, in OCaml, "one has to write tail recursively in OCaml"?
Please note, I am still a Haskell (FP) learner - My questions above are only
meant for me to understand things better.
Regards,
Kashyap
On Sat, Apr 17, 2010 at 7:12 AM, jeff p
Hello,
One major thing I haven't seen explicitly mentioned yet in this thread is tail recursion. You have to write tail recursively in OCaml (or any strict language) or you will blow the stack. While tail recursion is often wrong (in terms of efficiency) in Haskell, it is always right in OCaml.
-Jeff _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Regards, Kashyap

On Fri, Apr 16, 2010 at 8:35 PM, C K Kashyap
I am a little surprised by the "shortcomings" of Haskell mentioned in the thread.
I was under the impression that Haskell was closest to Nirvana on the usefulness vs safety graph.
In the paper "Why FP matters" - Laziness is stated as one of the two key features of FP that allows conceptual modularity! If I understand right, Laziness is not a first class stuff in OCaml - is that not right?
That's right. You can simulate laziness in OCaml. OCaml compilers cannot assume code is pure and lazy so many very useful optimizations are not performed by the OCaml compiler. And some syntactic and stylist things are not available. See for example Ganesh's comment about combinators earlier in this thread.
If I understand correctly - Not allowing side-effects allow for equational reasoning - which in turn allows you to "reason" about the program better. If I understand right - OCaml allows side effects right?
Right.
Jeff, could you please expand on the tail recursion bit - what do you mean when you say, in OCaml, "one has to write tail recursively in OCaml"?
Tail recursive functions do not need to maintain a stack for their recursive calls and the optimizer may replace them with a loop. In strict functional languages such as OCaml this means that by writing your recursive functions to be tail recursive you won't run out of stack space. Many recursive functions can be transformed to be tail recursive by adding an accumulator argument and building up the return value there. In a lazy language if you write something to be tail recursive, then each time the function recurses it will add to the accumulator. Because it's tail recursive it typically also won't evaluate this accumulator. This means that in memory you have to build up the unevaluated representation, or thunk. This thunk keeps getting bigger and bigger until something demands the value. This means that if you need to write a recursive function in haskell which has an accumulator, you often want to make the function strict in the accumulator. Meaning the strict function will evaluate the accumulator as it goes. One place where lazy accumulators is bad are the left folds. There is the lazy foldl and the version which is strict in the accumulator, foldl'. Try summing big lists of integers, let's use ghci and limit the heap to 1 meg: ghci +RTS -M1M Prelude> foldl (+) 0 [1..10000000] Heap exhausted; Current maximum heap size is 6291456 bytes (6 MB); use `+RTS -M<size>' to increase it. ghci +RTS -M1M Prelude> :m + Data.List Prelude Data.List> foldl' (+) 0 [1..10000000] 50000005000000 Haskell lets us choose the evaluation strategy here. After programming in both Haskell and Common Lisp for several years each I can tell you that Haskell's way of dealing with evaluation strategies and letting the user pick the desired one is easier to manage and deal with. I would expect OCaml less nice than Haskell here as well. Jason

On Sat, Apr 17, 2010 at 8:22 AM, Jason Dagit
... One place where lazy accumulators is bad are the left folds. There is the lazy foldl and the version which is strict in the accumulator, foldl'. Try summing big lists of integers, let's use ghci and limit the heap to 1 meg:
ghci +RTS -M1M Prelude> foldl (+) 0 [1..10000000] Heap exhausted; Current maximum heap size is 6291456 bytes (6 MB); use `+RTS -M<size>' to increase it.
If we define foldl as: foldl :: (b -> a -> b) -> b -> [a] -> b foldl f z [] = z foldl f z (x:xs) = let z' = f z x in foldl f z' xs and: sum :: [Int] -> Int sum = foldl (+) 0 Then the program: 'sum 1000000' will indeed use a lot of heap space (and will run out of it if you limit the heap to 1MB). The reason for this is explained in [1]. In short: foldl will start allocating thunks on your heap which each add an element of the list to a previous thunk as in: let z1 = 0 + 1 z2 = z1 + 2 z3 = z2 + 3 z4 = z3 + 4 ... z999997 = z999996 + 999997 z999998 = z999997 + 999998 z999999 = z999998 + 999999 z100000 = z999999 + 1000000 in z1000000 This can be visualized if we generate a heap profile: $ ghc --make FoldlProfile.hs -o foldlProfile -O2 -prof $ ./foldlProfile 1000000 +RTS -hy Stack space overflow: current size 8388608 bytes. $ hp2ps -c foldlProfile.hp $ ps2pdf foldlProfile.ps Result: http://bifunctor.homelinux.net/~bas/foldlProfile.pdf You clearly see that this program allocates well over 22MB of heap space! This is not a problem if you can give the program a big enough heap. However have you noticed the more serious problem? The problem starts when we finally evaluate z1000000: Note that z1000000 = z999999 + 1000000. So 1000000 is pushed on the stack. Then z999999 is evaluated. Note that z999999 = z999998 + 999999. So 999999 is pushed on the stack. Then z999998 is evaluated: Note that z999998 = z999997 + 999998. So 999998 is pushed on the stack. Then z999997 is evaluated: So ... ...your limited stack will eventually run full when you evaluate a large enough chain of (+)s. This then triggers a stack overflow exception! Interestingly, when we define foldl as: foldl :: (b -> a -> b) -> b -> [a] -> b foldl f = foldl_f where foldl_f z [] = z foldl_f z (x:xs) = let z' = f z x in foldl_f z' xs then both the heap and stack overflow problems go away. (this is how foldl is actually implemented[2] in GHC) I don't know why the heap and stack overflow problems go away. So lets look at the core output of the latter program: $ ghc-core -- -O2 FoldlProfile.hs $wsum :: [Int] -> Int# $wsum = \ (w_s1rS :: [Int]) -> $wfoldl_f 0 w_s1rS $wfoldl_f :: Int# -> [Int] -> Int# $wfoldl_f = \ (ww_s1rK :: Int#) (w_s1rM :: [Int]) -> case w_s1rM of _ { [] -> ww_s1rK; : x_aeb xs_aec -> case x_aeb of _ { I# y_aUb -> $wfoldl_f (+# ww_s1rK y_aUb) xs_aec } } Apparently, because of the latter foldl definition, GHC is able to optimize the foldl for uboxed ints. But why doesn't the recursive call: $wfoldl_f (+# ww_s1rK y_aUb) xs_aec allocate (+# ww_s1rK y_aUb) on the heap? Are unboxed values always evaluated strictly? For reference, this is the core output of the former foldl: sum :: [Int] -> Int sum = foldl @ Int @ Int plusInt main3 main3 = I# 0 foldl :: forall b_aer a_aes. (b_aer -> a_aes -> b_aer) -> b_aer -> [a_aes] -> b_aer foldl = \ (@ b_afF) (@ a_afG) (f_aet :: b_afF -> a_afG -> b_afF) (z_aeu :: b_afF) (ds_drS :: [a_afG]) -> case ds_drS of _ { [] -> z_aeu; : x_aex xs_aey -> foldl @ b_afF @ a_afG f_aet (f_aet z_aeu x_aex) xs_aey } regards, Bas [1] http://haskell.org/haskellwiki/Foldr_Foldl_Foldl' [2] http://haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.0/src/GHC-List....

Bas van Dijk schrieb:
I don't know why the heap and stack overflow problems go away. So lets look at the core output of the latter program:
$ ghc-core -- -O2 FoldlProfile.hs
$wsum :: [Int] -> Int# $wsum = \ (w_s1rS :: [Int]) -> $wfoldl_f 0 w_s1rS
$wfoldl_f :: Int# -> [Int] -> Int# $wfoldl_f = \ (ww_s1rK :: Int#) (w_s1rM :: [Int]) -> case w_s1rM of _ { [] -> ww_s1rK; : x_aeb xs_aec -> case x_aeb of _ { I# y_aUb -> $wfoldl_f (+# ww_s1rK y_aUb) xs_aec } }
Apparently, because of the latter foldl definition, GHC is able to optimize the foldl for uboxed ints.
But why doesn't the recursive call:
$wfoldl_f (+# ww_s1rK y_aUb) xs_aec
allocate (+# ww_s1rK y_aUb) on the heap? Are unboxed values always evaluated strictly?
I think so.
participants (9)
-
aditya siram
-
Bas van Dijk
-
C K Kashyap
-
Henning Thielemann
-
Jason Dagit
-
jeff p
-
José Iborra
-
Sittampalam, Ganesh
-
Stephen Tetley