RE: turn off let floating

So I have code like:
{-# NOINLINE count #-} count :: IORef Int count = unsafePerformIO $ newIORef 0
{-# NOINLINE getCount #-} getCount :: (Int -> a) -> a getCount f = let nextCount = (unsafePerformIO $ do oldCount <- readIORef count let newCount = oldCount + 1 writeIORef count newCount return oldCount) in seq nextCount (f nextCount)
It seems to work okay.
However, if you have any suggestions about how to make a FAST global counter I would be very glad to hear it. From profiling it seems like this code is a little expensive (also it is called quite frequently).
You could try the FastMutInt module from GHC (ghc/compiler/utils/FastMutInt.hs) to speed things up. Unfortunately unsafePerformIO has some unavoidable overhead: it can't be inlined because we don't want the compiler to see its definition. Cheers, Simon

However, if you have any suggestions about how to make a FAST global counter I would be very glad to hear it. From profiling it seems like this code is a little expensive (also it is called quite frequently).
You could try the FastMutInt module from GHC (ghc/compiler/utils/FastMutInt.hs) to speed things up. Unfortunately unsafePerformIO has some unavoidable overhead: it can't be inlined because we don't want the compiler to see its definition.
What happens if you use the FFI to call a C function like int getCount() { static int x; return x++; } and mark the function pure (outside the IO monad) and noinline? (Probably all the calls get commoned up and it only gets called once; but it might be worth a try). You mentioned that you're trying to get a new counter value for every function application; maybe something like an FFI call to int getCount(void *f, void *a) { static int x; return x++; } where you have getCount :: (a -> b) -> a -> Int; then you pass the function and its argument to getCount. This should prevent any unwanted common subexpression elimination. Carl Witty

On Thu, Apr 15, 2004 at 10:43:22AM -0700, Carl Witty wrote:
However, if you have any suggestions about how to make a FAST global counter I would be very glad to hear it. From profiling it seems like this code is a little expensive (also it is called quite frequently).
You could try the FastMutInt module from GHC (ghc/compiler/utils/FastMutInt.hs) to speed things up. Unfortunately unsafePerformIO has some unavoidable overhead: it can't be inlined because we don't want the compiler to see its definition.
What happens if you use the FFI to call a C function like int getCount() { static int x; return x++; } and mark the function pure (outside the IO monad) and noinline? (Probably all the calls get commoned up and it only gets called once; but it might be worth a try).
Hi all, To test out the various possible ways of implementing a global counter I wrote some test cases (shown below). I hope the test cases are useful, and provide some indication of the relative performance. However, if you spot something bogus please let me know. Each program computes the equivalent of: sum ([1..100000000] :: [Int]) There are four different ways that I tried: 1) pure: this is just pure functional code and should be fast. This test case is only here as a control example, it is not a candidate solution because I need a global counter. 2) ioref: this uses a global mutable counter using IORefs and unsafePerformIO 3) fastMut: this uses the fast mutable integer library from GHC that was suggested by Simon Marlow. 4) ffi: this implements the counter in C using the FFI. They all run in a reasonable amount of memory so I won't report the memory information here, just total runtime, as computed by the unix "time" command. Results: method runtime (s) --------------------------- pure 0.7 ffi 3.2 fastMut 15 ioref 23 Note each program was compiled with ghc 6.2 with -O2 on debian linux. One caveat is that the ffi code keeps the counter in C until the very end of the program. This doesn't reflect the fact that I want to put each value of the counter into a Haskell data structure, so there should be an additional cost of turning the C int back into a Haskell Int for every increment. I'll need to write a different test case for this aspect. Here are the programs in the same order that they appear in the results table: -------------------------------------------------------------------------------- {- pure -} module Main where main = print $ loop 100000000 0 loop :: Int -> Int -> Int loop 0 acc = acc loop n acc = loop (n-1) $! (acc + n) -------------------------------------------------------------------------------- /* ffi Haskell code */ {-# OPTIONS -fglasgow-exts #-} module Main where -- the use of unsafe makes a big difference in runtime foreign import ccall unsafe "incC" inc :: Int -> () foreign import ccall "getCounterC" getCounter :: Int -> IO Int printCounter :: IO () printCounter = do val <- getCounter 0 -- the 0 is bogus print val main :: IO () main = seq (loop 100000000) printCounter loop :: Int -> () loop 0 = () loop n = seq (inc n) (loop $! n - 1) /* ffi C code */ #include "inc.h" int counter = 0; void incC (int howmuch) { counter+=howmuch; } int getCounterC (int bogus) { return counter; } -------------------------------------------------------------------------------- {- fastMut -} module Main where import System.IO.Unsafe (unsafePerformIO) import FastMutInt {-# NOINLINE counter #-} counter :: FastMutInt counter = unsafePerformIO newFastMutInt {-# NOINLINE inc #-} inc :: Int -> () inc n = unsafePerformIO $ do incFastMutIntBy counter n return () printCounter :: IO () printCounter = do val <- readFastMutInt counter print val main :: IO () main = do writeFastMutInt counter 0 seq (loop 100000000) printCounter loop :: Int -> () loop 0 = () loop n = seq (inc n) (loop $! n - 1) -------------------------------------------------------------------------------- {- ioref -} module Main where import System.IO.Unsafe (unsafePerformIO) import Data.IORef (newIORef, readIORef, writeIORef, IORef) counter :: IORef Int {-# NOINLINE counter #-} counter = unsafePerformIO (newIORef 0) {-# NOINLINE inc #-} inc :: Int -> () inc n = unsafePerformIO $ do old <- readIORef counter writeIORef counter $! old + n printCounter :: IO () printCounter = do val <- readIORef counter print val main :: IO () main = seq (loop 100000000) printCounter loop :: Int -> () loop 0 = () loop n = seq (inc n) (loop $! n - 1) --------------------------------------------------------------------------------

Bernard James POPE
Note each program was compiled with ghc 6.2 with -O2 on debian linux. : main = print $ loop 100000000 0
Isn't this going to be optimized away to a constant with -O2? -kzm -- If I haven't seen further, it is by standing in the footprints of giants

On Tue, Apr 20, 2004 at 02:56:36PM +0200, Ketil Malde wrote:
Bernard James POPE
writes: Note each program was compiled with ghc 6.2 with -O2 on debian linux. : main = print $ loop 100000000 0
Isn't this going to be optimized away to a constant with -O2?
Here's the final stg code, obtained by: ghc -ddump-stg -O2 --make MainPure.hs -o pure ==================== STG syntax: ==================== Main.$wloop = \r [ww ww1] case ww of ds { __DEFAULT -> case +# [ww1 ds] of sat_s2pI { __DEFAULT -> case -# [ds 1] of sat_s2pE { __DEFAULT -> Main.$wloop sat_s2pE sat_s2pI; }; }; 0 -> ww1; }; SRT(Main.$wloop): [] Main.loop = \r [w w1] case w of w2 { GHC.Base.I# ww -> case w1 of w3 { GHC.Base.I# ww1 -> case Main.$wloop ww ww1 of ww2 { __DEFAULT -> GHC.Base.I# [ww2]; }; }; }; SRT(Main.loop): [] Main.eta = \u [] case Main.$wloop 100000000 0 of ww { __DEFAULT -> GHC.Base.I# [ww]; }; SRT(Main.eta): [] Main.lvl = \u srt:(0,*bitmap*) [] case Main.eta of w { GHC.Base.I# ww -> GHC.Show.$wshowSignedInt 0 ww GHC.Base.[]; }; SRT(Main.lvl): [Main.eta] Main.main = \r srt:(0,*bitmap*) [s] case GHC.IO.hGetLine GHC.Handle.stdin s of wild { GHC.Prim.(#,#) new_s a41 -> case GHC.IO.hPutStr GHC.Handle.stdout Main.lvl new_s of wild1 { GHC.Prim.(#,#) new_s1 a411 -> GHC.IO.$whPutChar GHC.Handle.stdout '\n' new_s1; }; }; SRT(Main.main): [GHC.Handle.stdout, GHC.IO.$whPutChar, GHC.IO.hPutStr, GHC.Handle.stdin, GHC.IO.hGetLine, Main.lvl] :Main.main = \r srt:(0,*bitmap*) [eta1] catch# [Main.main GHC.TopHandler.topHandler eta1]; SRT(:Main.main): [Main.main, GHC.TopHandler.topHandler]

On 20/04/2004, at 9:48 PM, Bernard James POPE wrote:
To test out the various possible ways of implementing a global counter I wrote some test cases (shown below). I hope the test cases are useful, and provide some indication of the relative performance. However, if you spot something bogus please let me know.
Each program computes the equivalent of:
sum ([1..100000000] :: [Int])
There are four different ways that I tried:
1) pure: this is just pure functional code and should be fast. This test case is only here as a control example, it is not a candidate solution because I need a global counter.
2) ioref: this uses a global mutable counter using IORefs and unsafePerformIO
3) fastMut: this uses the fast mutable integer library from GHC that was suggested by Simon Marlow.
4) ffi: this implements the counter in C using the FFI.
There's another way which you missed: using implicit parameters. I remember reading a paper a while ago called Global Variables in Haskell (sorry, don't remember the author -- Jones, perhaps?) which did similar benchmarking to yours, and carrying around the global variable with an implicit parameter was faster than using a global mutable counter via "unsafePerformIO $ newIORef ...". -- % Andre Pang : trust.in.love.to.save

Hi Andre,
There's another way which you missed: using implicit parameters. I remember reading a paper a while ago called Global Variables in Haskell (sorry, don't remember the author -- Jones, perhaps?) which did similar benchmarking to yours, and carrying around the global variable with an implicit parameter was faster than using a global mutable counter via "unsafePerformIO $ newIORef ...".
Thanks for the note. That was in the paper by John Hughes. The performance difference between unsafePerformIO and implicit parameters is not significant in his test case. I think he was surprised that implicit parameters worked so well (I am too). That doesn't mean I should rule it out completely. I'll have a look into it. Also, implicit parameters are less convenient for the program transformation that I use in buddha, whereas a truly global variable is ideal. On the other hand the FFI approach looks a lot faster already, and Simon has suggested that I can inline unsafePerformIO. Cheers, Bernie.

I'm using GHC 6.2.1 on Windows 2000. Problem 1: -hr crashes in some circumstances. Take this program (Test.hs): module Main where import IO main = do readFile "large.csv" putStrLn "OK" where large.csv is an 800K CSV file (with very small files, the bug doesn't occur). I compile with: ghc -prof -auto-all Test.hs and run with: a.out +RTS -hr and get a Windows error message indicating that the program had to be terminated. The "OK" appears, though. When I run the test program repeatedly, it sometimes works fine (which is not the case for the real-world program I'm trying to profile). Other options (-hc, -hy etc) work fine. Unfortunately, retainer information is the most useful of them all... Problem 2: hp2ps doesn't work at all. When trying to read this file (a.out.hp): JOB "a.out +RTS -hr " DATE "Fri Apr 16 11:15 2004" SAMPLE_UNIT "seconds" VALUE_UNIT "bytes" BEGIN_SAMPLE 0.00 END_SAMPLE 0.00 BEGIN_SAMPLE 0,01 END_SAMPLE 0,01 it says "a.out.hp, line 7, floating point number must follow BEGIN_SAMPLE". I assume the apparently mixed English/German number formatting causes this problem. -Stefan

Stefan Reich wrote:
Problem 1: -hr crashes in some circumstances. [...]
No idea about this one...
Problem 2: hp2ps doesn't work at all. [...]
That's a little bit harsh. :-) First of all it's not hp2ps, but GHC's RTS which has a buglet. Furthermore, it only has this problem in some locales. As a workaround you could try to switch your locale e.g. to English while running your Haskell program. I've fixed this about a month ago in the HEAD, but it did not make its way into the STABLE branch. SimonM? Cheers, S.
participants (7)
-
Andre Pang
-
Bernard James POPE
-
Carl Witty
-
Ketil Malde
-
Simon Marlow
-
Stefan Reich
-
Sven Panne