Optimization demonstration

Dear Café,
I'm trying to do very small, but impressive example about optimizations
possible during Haskell compilation. So far, I can demonstrate that the following two
programs (if compiled) perform computation in the same time:
1)
main =
putStrLn $ show $ sum $ map (*(2::Int)) [(1::Int)..(100000000::Int)]
2)
main =
putStrLn $! show $! sumup 1 0
sumup :: Int -> Int -> Int
sumup n total =
if n<=(100000000::Int) then sumup (n+1) $! total+(2*n)
else total
Nevertheless, I expect a question on comparison with C:
3)
#include

You can use Template Haskell to perform arbitrary computation at
compile-time (even if it requires IO!), and then `lift` the result into a
Haskell literal. This works for any type with a `Lift` instance (or with a
bit of trick, any serializable type).
Coming back to your use case, you may try avoid using raw lists and switch
to unboxed vectors, turn on -O2 and rely on stream fusion of the vector
package. That will result in a considerable speedup.
On Tue, Feb 27, 2018, 11:09 PM Dušan Kolář
Dear Café,
I'm trying to do very small, but impressive example about optimizations possible during Haskell compilation. So far, I can demonstrate that the following two programs (if compiled) perform computation in the same time:
1)
main =
putStrLn $ show $ sum $ map (*(2::Int)) [(1::Int)..(100000000::Int)]
2)
main =
putStrLn $! show $! sumup 1 0
sumup :: Int -> Int -> Int
sumup n total =
if n<=(100000000::Int) then sumup (n+1) $! total+(2*n)
else total
Nevertheless, I expect a question on comparison with C:
3)
#include
int main(void) {
long sum, i;
sum = 0;
for (i=1; i <= 100000000L; ++i) {
sum += 2*i;
}
printf("%ld\n",sum);
return 0;
}
Unfortunately, in this case the C is much more faster (it prints the result immediately), at least on my machine. Is it due to a fact that C compiler does a brutal optimization leading to compile-time evaluation, while ghc is not able to do that?
I'm using -O2 -dynamic --make ghc compiler flags. For gcc for C compilation just -O2, running Arch Linux.
Is there any option, how to force compile time evaluation? The reason, why I think it works this way is the fact, that when running out of long type values in C a code is generated that computes the values regularly (providing misleading value as a result) taking its time.
Best regards,
Dušan
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

I thought fusion might be the answer, but don't the standard list functions
have rewrite rules for that too? Build/consume; this should apply directly
to this example (version 1).
On Feb 27, 2018 07:30, "Shao Cheng"
Dear Café,
I'm trying to do very small, but impressive example about optimizations possible during Haskell compilation. So far, I can demonstrate that the following two programs (if compiled) perform computation in the same time:
1)
main =
putStrLn $ show $ sum $ map (*(2::Int)) [(1::Int)..(100000000::Int)]
2)
main =
putStrLn $! show $! sumup 1 0
sumup :: Int -> Int -> Int
sumup n total =
if n<=(100000000::Int) then sumup (n+1) $! total+(2*n)
else total
Nevertheless, I expect a question on comparison with C:
3)
#include
int main(void) {
long sum, i;
sum = 0;
for (i=1; i <= 100000000L; ++i) {
sum += 2*i;
}
printf("%ld\n",sum);
return 0;
}
Unfortunately, in this case the C is much more faster (it prints the result immediately), at least on my machine. Is it due to a fact that C compiler does a brutal optimization leading to compile-time evaluation, while ghc is not able to do that?
I'm using -O2 -dynamic --make ghc compiler flags. For gcc for C compilation just -O2, running Arch Linux.
Is there any option, how to force compile time evaluation? The reason, why I think it works this way is the fact, that when running out of long type values in C a code is generated that computes the values regularly (providing misleading value as a result) taking its time.
Best regards,
Dušan
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 2018-02-27 08:19 AM, Shao Cheng wrote: Coming back to your use case, you may try avoid using raw lists and switch to unboxed vectors, turn on -O2 and rely on stream fusion of the vector package. That will result in a considerable speedup. I looked at the core that’s generated, and there’s no need for vectors. Fusion happens, there’s no use of lists at all and unboxed types are used. The code boils down to a single recursive function: |let go i sum = case i of 100000000 -> sum + 200000000 _ -> go (i + 1) (sum + i * 2) in go 1 0 | except that the types are unboxed. The following complete program compiles down to almost identical core when compiled without optimization: |{-# LANGUAGE MagicHash #-} import GHC.Exts main = print $ I# value where value = let go :: Int# -> Int# -> Int# go i sum = case i of 100000000# -> sum +# 200000000# _ -> go (i +# 1#) (sum +# i *# 2#) in go 1# 0# | I think that’s impressive even if it’s not a single number. Execution time on my lowly i5 is only 50ms. BTW, GHC 8 seems to have removed the option for exporting core (|-fext-core|) but there’s a wonderful plugin package called |dump-core| https://github.com/yav/dump-core that produces HTML output with colouring and interactivity. You just install it from Hackage and use the extra options it provides. It seems to me that gcc’s compile-time evaluation of this loop is a special-case that matches the kind of thing that often crops up in C. I assume it’s not capable of doing that for every expression that could be evaluated at compile time, so a more complicated and realistic example would probably defeat it. After all, ghc could in theory evaluate any pure value (CAF) at compile time if it chose to, but that’s usually not what you want. Also, it’s worth noting that due to Haskell’s lazy evaluation, a pure value (CAF) will never be evaluated more than once at runtime, which isn’t something you get with C.

-fext-core wasn't about exporting it, but about accepting core as *source* ("external core"). Which was always tricky and was broken for years before the option was removed. On Tue, Feb 27, 2018 at 1:51 PM, Neil Mayhew < neil_mayhew@users.sourceforge.net> wrote:
On 2018-02-27 08:19 AM, Shao Cheng wrote:
Coming back to your use case, you may try avoid using raw lists and switch to unboxed vectors, turn on -O2 and rely on stream fusion of the vector package. That will result in a considerable speedup.
I looked at the core that’s generated, and there’s no need for vectors. Fusion happens, there’s no use of lists at all and unboxed types are used. The code boils down to a single recursive function:
let go i sum = case i of 100000000 -> sum + 200000000 _ -> go (i + 1) (sum + i * 2)in go 1 0
except that the types are unboxed. The following complete program compiles down to almost identical core when compiled without optimization:
{-# LANGUAGE MagicHash #-} import GHC.Exts main = print $ I# value where value = let go :: Int# -> Int# -> Int# go i sum = case i of 100000000# -> sum +# 200000000# _ -> go (i +# 1#) (sum +# i *# 2#) in go 1# 0#
I think that’s impressive even if it’s not a single number. Execution time on my lowly i5 is only 50ms.
BTW, GHC 8 seems to have removed the option for exporting core (-fext-core) but there’s a wonderful plugin package called dump-core https://github.com/yav/dump-core that produces HTML output with colouring and interactivity. You just install it from Hackage and use the extra options it provides.
It seems to me that gcc’s compile-time evaluation of this loop is a special-case that matches the kind of thing that often crops up in C. I assume it’s not capable of doing that for every expression that could be evaluated at compile time, so a more complicated and realistic example would probably defeat it. After all, ghc could in theory evaluate any pure value (CAF) at compile time if it chose to, but that’s usually not what you want.
Also, it’s worth noting that due to Haskell’s lazy evaluation, a pure value (CAF) will never be evaluated more than once at runtime, which isn’t something you get with C.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On 2018-02-27 09:55 PM, Brandon Allbery wrote:
-fext-core wasn't about exporting it, but about accepting core as *source* ("external core"). Which was always tricky and was broken for years before the option was removed.
Thanks. I see I should have been using `-ddump-simpl` instead.

On Tue, Feb 27, 2018 at 10:06 AM, Dušan Kolář
Unfortunately, in this case the C is much more faster (it prints the result immediately), at least on my machine. Is it due to a fact that C compiler does a brutal optimization leading to compile-time evaluation, while ghc is not able to do that?
ghc is less prone to invoke that kind of optimization, but sometimes can do so. And yes, gcc is decidedly "brutal" with -O2: inspect the generated assembler and you'll find that it just prints a constant. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

In other words, it's not competition with the language C but with its
popular compiler. Choose an example that doesn't simplify and you'll get a
fairer contest.
On Feb 27, 2018 07:52, "Brandon Allbery"
On Tue, Feb 27, 2018 at 10:06 AM, Dušan Kolář
wrote: Unfortunately, in this case the C is much more faster (it prints the result immediately), at least on my machine. Is it due to a fact that C compiler does a brutal optimization leading to compile-time evaluation, while ghc is not able to do that?
ghc is less prone to invoke that kind of optimization, but sometimes can do so. And yes, gcc is decidedly "brutal" with -O2: inspect the generated assembler and you'll find that it just prints a constant.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

You can prevent this particular optimization-to-constant by declaring sum '
volatile https://en.wikipedia.org/wiki/Volatile_(computer_programming)'
as below:
#include
In other words, it's not competition with the language C but with its popular compiler. Choose an example that doesn't simplify and you'll get a fairer contest.
On Feb 27, 2018 07:52, "Brandon Allbery"
wrote: On Tue, Feb 27, 2018 at 10:06 AM, Dušan Kolář
wrote: Unfortunately, in this case the C is much more faster (it prints the result immediately), at least on my machine. Is it due to a fact that C compiler does a brutal optimization leading to compile-time evaluation, while ghc is not able to do that?
ghc is less prone to invoke that kind of optimization, but sometimes can do so. And yes, gcc is decidedly "brutal" with -O2: inspect the generated assembler and you'll find that it just prints a constant.
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
participants (6)
-
Brandon Allbery
-
Dušan Kolář
-
Neil Mayhew
-
Ryan Reich
-
Shao Cheng
-
Vale Cofer-Shabica