Speeding up trivial programs

I'm working on a small program that has to run many, many times, as quickly as possible (yes, it needs to be a standalone program). I've optimized it in many ways, but I seem to have a time floor, observable with "Hello, world" $ cat Hello.hs main = putStrLn "Hello world" $ ghc -O2 Hello.hs $ time ./Hello Hello world real 0m0.150s user 0m0.117s sys 0m0.032s The equivalent program in C takes only 0.002s (75x faster). What is taking the extra time? Is it the RTS "booting"? Is there any way to speed this up? Thanks, Tom

I think there's already a ticket for slow RTS startup, although I didn't find it on a quick search, and that time looks similar to the examples I saw (around a tenth of a second). On Fri, Aug 23, 2024 at 2:50 PM amindfv--- via Haskell-Cafe < haskell-cafe@haskell.org> wrote:
I'm working on a small program that has to run many, many times, as quickly as possible (yes, it needs to be a standalone program).
I've optimized it in many ways, but I seem to have a time floor, observable with "Hello, world"
$ cat Hello.hs main = putStrLn "Hello world" $ ghc -O2 Hello.hs $ time ./Hello Hello world
real 0m0.150s user 0m0.117s sys 0m0.032s
The equivalent program in C takes only 0.002s (75x faster).
What is taking the extra time? Is it the RTS "booting"? Is there any way to speed this up?
Thanks, Tom _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

On 2024-08-23 11:49:26, amindfv--- via Haskell-Cafe wrote:
I'm working on a small program that has to run many, many times, as quickly as possible (yes, it needs to be a standalone program).
I'd reconsider this "needs". Or, I'd reconsider the use of Haskell. If the program is small enough that the startup time dominates, how much code can there be? A memory-safe language like Rust but which has a much smaller RTS might a better fit. regards, iustin

On Fri, 23 Aug 2024, Iustin Pop wrote:
On 2024-08-23 11:49:26, amindfv--- via Haskell-Cafe wrote:
I'm working on a small program that has to run many, many times, as quickly as possible (yes, it needs to be a standalone program).
I'd reconsider this "needs". Or, I'd reconsider the use of Haskell.
Or a tiny stand-alone low-level program that communicates with a server written in Haskell?

I think I'd be reconsidering _any_ design involving running an external program many times like that, to be honest. Dragging in potential fork delays and other overhead that a benchmark probably won't show you doesn't appeal. On Fri, Aug 23, 2024 at 3:16 PM Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Fri, 23 Aug 2024, Iustin Pop wrote:
On 2024-08-23 11:49:26, amindfv--- via Haskell-Cafe wrote:
I'm working on a small program that has to run many, many times, as quickly as possible (yes, it needs to be a standalone program).
I'd reconsider this "needs". Or, I'd reconsider the use of Haskell.
Or a tiny stand-alone low-level program that communicates with a server written in Haskell? _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

On Fri, Aug 23, 2024 at 03:19:28PM -0400, Brandon Allbery wrote:
I think I'd be reconsidering _any_ design involving running an external program many times like that, to be honest. Dragging in potential fork delays and other overhead that a benchmark probably won't show you doesn't appeal.
Calling short-lived unix processes many times (note I didn't say where or when) might not be the right solution to most problems. (That's probably why I've never noticed this timing floor before.) But trust me, it is a real -- and actually sensible in context -- requirement here. Server/client communication is unfortunately not possible in this case. Rewriting entirely in a language other than Haskell is possible, but not what I'd prefer given that the code is fairly complex (if fast). Does anybody know more about what specifically is happening during these 150ms? Thanks, Tom

Hi Tom, On my computer (and using GHC-9.8) it's a bit quicker: real 0m0.006s user 0m0.001s sys 0m0.005s I imagine some of this time is just loading things into memory and/or running the dynamic linker. If you are optimising for start up time, maybe fully statically linking your executable might help. Cheers, Teo On Fri, Aug 23, 2024 at 9:02 PM amindfv--- via Haskell-Cafe < haskell-cafe@haskell.org> wrote:
On Fri, Aug 23, 2024 at 03:19:28PM -0400, Brandon Allbery wrote:
I think I'd be reconsidering _any_ design involving running an external program many times like that, to be honest. Dragging in potential fork delays and other overhead that a benchmark probably won't show you doesn't appeal.
Calling short-lived unix processes many times (note I didn't say where or when) might not be the right solution to most problems. (That's probably why I've never noticed this timing floor before.) But trust me, it is a real -- and actually sensible in context -- requirement here.
Server/client communication is unfortunately not possible in this case. Rewriting entirely in a language other than Haskell is possible, but not what I'd prefer given that the code is fairly complex (if fast).
Does anybody know more about what specifically is happening during these 150ms?
Thanks, Tom _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Fri, Aug 23, 2024 at 09:22:15PM +0100, Teofil Camarasu wrote:
On my computer (and using GHC-9.8) it's a bit quicker: real 0m0.006s user 0m0.001s sys 0m0.005s
I imagine some of this time is just loading things into memory and/or running the dynamic linker.
If you are optimising for start up time, maybe fully statically linking your executable might help.
I see similarly low startup using GHC 9.8:
$ cat hw.hs
module Main(main) where
main :: IO ()
main = putStrLn "Hello World!"
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.8.2
$ ghc -O2 hw.hs
[1 of 2] Compiling Main ( hw.hs, hw.o )
[2 of 2] Linking hw
$ time ./hw
Hello World!
real 0m0.005s
user 0m0.001s
sys 0m0.004s
The corresponding C program is not dramatically faster:
$ cat hw.c
#include

I don't see any significant difference:
~ via C v14.2.1-gcc via λ 9.10.1
❯ cat hello.hs
module Main where
main = putStrLn "Hello!"
~ via C v14.2.1-gcc via λ 9.10.1
❯ ./hello
Hello!
~ via C v14.2.1-gcc via λ 9.10.1
❯ time ./hello
Hello!
./hello 0.00s user 0.00s system 85% cpu 0.003 total
~ via C v14.2.1-gcc via λ 9.10.1
❯ cat hello.c
#include
participants (7)
-
amindfv@mailbox.org
-
Brandon Allbery
-
divya@subvertising.org
-
Henning Thielemann
-
Iustin Pop
-
Teofil Camarasu
-
Viktor Dukhovni