
Hi, First of all, I found it interesting that loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s) loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n])) I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function? 2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? ) Consider below function: f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s ) When call f3: loopM_ (f3 uu res) 1 1 1000000 Which will have blow profiling output: individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc ... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought. Did I get something fundamentally wrong? Thanks baojun

I recall having this issue also, and I eventually ending up concluding that
GHC didn't optimise away the list in "mapM_". Try with GHC 7.10 though, it
seems to be better with optimising lists into simple loops.
On Fri, Jul 3, 2015 at 6:18 PM, Baojun Wang
Hi,
First of all, I found it interesting that
loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s)
loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n]))
I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function?
2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? )
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Which will have blow profiling output:
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought.
Did I get something fundamentally wrong?
Thanks baojun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Fri, Jul 03, 2015 at 08:18:55AM +0000, Baojun Wang wrote:
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Could you provide complete working code (including definitions of uu and res) and I will have a go at diagnosing. (My first guess is that you need to force (v1 + (fromIntegral i) * (fromIntegral v)).) Tom

Full source code: https://github.com/wangbj/haskell/blob/master/lcmsum.hs I build the program with: ghc -O2 --make -rtsopts -prof -auto-all lcmsum and run it with: echo -ne '5\n100\n1000\n10000\n100000\n1000000\n' | ./lcmsum +RTS -sstderr -p I've tried use ``let !ret = (v1+(fromIntegral i) * (fromIntegral v)) in r'' instead, however it didn't make any difference per profiling. (both Array are Unboxed). Thanks baojun On Fri, Jul 3, 2015 at 2:50 AM Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
On Fri, Jul 03, 2015 at 08:18:55AM +0000, Baojun Wang wrote:
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Could you provide complete working code (including definitions of uu and res) and I will have a go at diagnosing. (My first guess is that you need to force (v1 + (fromIntegral i) * (fromIntegral v)).)
Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

few more points:
- In function f2/f3 I use a local function ``go'' instead of loopM_, it
turns out that local ``go'' function is much faster than loopM_ (loopM_ is
still faster than mapM_);
- If I add a signature for loopM_ ( loopM_ :: (Int -> IO ()) -> Int -> Int
-> Int -> IO () ), then the program runs noticeable slower even profiling
is turned off;
These really looks odd to me, and I think it really makes optimization
painful.
On Fri, Jul 3, 2015 at 9:06 AM Baojun Wang
Full source code:
https://github.com/wangbj/haskell/blob/master/lcmsum.hs
I build the program with:
ghc -O2 --make -rtsopts -prof -auto-all lcmsum
and run it with:
echo -ne '5\n100\n1000\n10000\n100000\n1000000\n' | ./lcmsum +RTS -sstderr -p
I've tried use
``let !ret = (v1+(fromIntegral i) * (fromIntegral v)) in r'' instead, however it didn't make any difference per profiling. (both Array are Unboxed).
Thanks baojun
On Fri, Jul 3, 2015 at 2:50 AM Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
On Fri, Jul 03, 2015 at 08:18:55AM +0000, Baojun Wang wrote:
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Could you provide complete working code (including definitions of uu and res) and I will have a go at diagnosing. (My first guess is that you need to force (v1 + (fromIntegral i) * (fromIntegral v)).)
Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Fri, Jul 03, 2015 at 04:06:45PM +0000, Baojun Wang wrote:
Full source code:
https://github.com/wangbj/haskell/blob/master/lcmsum.hs
I build the program with:
ghc -O2 --make -rtsopts -prof -auto-all lcmsum
and run it with:
echo -ne '5\n100\n1000\n10000\n100000\n1000000\n' | ./lcmsum +RTS -sstderr -p
I've tried use
``let !ret = (v1+(fromIntegral i) * (fromIntegral v)) in r'' instead, however it didn't make any difference per profiling. (both Array are Unboxed).
Correct, since they're unboxed forcing the thunk is done automatically. I'm puzzled. What output are you seeing and what were you expecting? I see "13 MB total memory in use". That doesn't sound like a lot when you are allocating two arrays of size 10^6. There's a lot of *allocation*, but not so much peak memory usage. I get the same result on GHC 7.6.3 and 7.8.4. My complete output: % ghc -O2 --make -rtsopts -prof -auto-all lcmsum [1 of 1] Compiling Main ( lcmsum.hs, lcmsum.o ) lcmsum.hs:48:9: Warning: In the use of `unsafeFreeze' (imported from Data.Array.IO, but defined in Data.Array.MArray): Deprecated: "Please import from Data.Array.Unsafe instead; This will be removed in the next release" lcmsum.hs:51:3: Warning: In the use of `unsafeFreeze' (imported from Data.Array.IO, but defined in Data.Array.MArray): Deprecated: "Please import from Data.Array.Unsafe instead; This will be removed in the next release" Linking lcmsum ... % echo -ne '5\n100\n1000\n10000\n100000\n1000000\n' | ./lcmsum +RTS -sstderr -p 152935099400 1529628080000 278320460000 277913417200000 277811686426000000 2,273,379,656 bytes allocated in the heap 233,104 bytes copied during GC 12,002,728 bytes maximum residency (3 sample(s)) 580,184 bytes maximum slop 13 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 4350 colls, 0 par 0.02s 0.02s 0.0000s 0.0000s Gen 1 3 colls, 0 par 0.00s 0.00s 0.0003s 0.0005s INIT time 0.00s ( 0.00s elapsed) MUT time 2.66s ( 2.65s elapsed) GC time 0.02s ( 0.02s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 2.67s ( 2.67s elapsed) %GC time 0.6% (0.6% elapsed) Alloc rate 856,218,428 bytes per MUT second Productivity 99.4% of total user, 99.5% of total elapsed

Here's a general question: Can the output of the Haskell compiler be
inspected in some - readable - way?
Stefan
On Fri, Jul 3, 2015 at 10:18 AM, Baojun Wang
Hi,
First of all, I found it interesting that
loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s)
loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n]))
I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function?
2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? )
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Which will have blow profiling output:
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought.
Did I get something fundamentally wrong?
Thanks baojun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

It can, if you know the correct magic incantations to give ghc (which I don't, but I know the knowledge is out there). The phrase to Google is "reading ghc core", where "core" refers to an intermediate language that still resembles Haskell. On Friday, July 3, 2015, Stefan Reich < stefan.reich.maker.of.eye@googlemail.com> wrote:
Here's a general question: Can the output of the Haskell compiler be inspected in some - readable - way?
Stefan
On Fri, Jul 3, 2015 at 10:18 AM, Baojun Wang
javascript:_e(%7B%7D,'cvml','wangbj@gmail.com');> wrote: Hi,
First of all, I found it interesting that
loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s)
loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n]))
I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function?
2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? )
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Which will have blow profiling output:
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought.
Did I get something fundamentally wrong?
Thanks baojun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:_e(%7B%7D,'cvml','Haskell-Cafe@haskell.org'); http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- -- Dan Burton

Probably easiest to just use the ghc-core package: http://hackage.haskell.org/package/ghc-core On Fri, Jul 03, 2015 at 10:19:14AM -0700, Dan Burton wrote:
It can, if you know the correct magic incantations to give ghc (which I don't, but I know the knowledge is out there). The phrase to Google is "reading ghc core", where "core" refers to an intermediate language that still resembles Haskell.
On Friday, July 3, 2015, Stefan Reich < stefan.reich.maker.of.eye@googlemail.com> wrote:
Here's a general question: Can the output of the Haskell compiler be inspected in some - readable - way?
Stefan
On Fri, Jul 3, 2015 at 10:18 AM, Baojun Wang
javascript:_e(%7B%7D,'cvml','wangbj@gmail.com');> wrote: Hi,
First of all, I found it interesting that
loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s)
loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n]))
I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function?
2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? )
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Which will have blow profiling output:
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought.
Did I get something fundamentally wrong?
Thanks baojun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:_e(%7B%7D,'cvml','Haskell-Cafe@haskell.org'); http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- -- Dan Burton
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

I see... ok, it's interesting, but seems to require a long time of studying
just to understand it. ^^
I am looking for something to solve computer science's complexity problem.
I believe we should have simple structures at every level, from high-level
to low-level.
btw I am redefining the levels like this:
Top-level: Your thoughts
Medium level: Shortened pseudo-code
Low level: A formerly called "high-level" language like Haskell or Java
So we're then two levels higher than before. ^^
Cheers
On Fri, Jul 3, 2015 at 7:19 PM, Dan Burton
It can, if you know the correct magic incantations to give ghc (which I don't, but I know the knowledge is out there). The phrase to Google is "reading ghc core", where "core" refers to an intermediate language that still resembles Haskell.
On Friday, July 3, 2015, Stefan Reich < stefan.reich.maker.of.eye@googlemail.com> wrote:
Here's a general question: Can the output of the Haskell compiler be inspected in some - readable - way?
Stefan
On Fri, Jul 3, 2015 at 10:18 AM, Baojun Wang
wrote: Hi,
First of all, I found it interesting that
loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s)
loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n]))
I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function?
2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? )
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Which will have blow profiling output:
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought.
Did I get something fundamentally wrong?
Thanks baojun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- -- Dan Burton

My concerns is ~80% alloc happens in f3, but both array is allocated by newArray? Since I'm using unboxed array I'm not expecting this kind of laziness. And the speed of local go (loopM_ equivalent) > loopM_ > mapM_ really surprised me, even profiling is turned off. On Fri, Jul 3, 2015 at 10:27 AM Stefan Reich < stefan.reich.maker.of.eye@googlemail.com> wrote:
I see... ok, it's interesting, but seems to require a long time of studying just to understand it. ^^
I am looking for something to solve computer science's complexity problem. I believe we should have simple structures at every level, from high-level to low-level.
btw I am redefining the levels like this:
Top-level: Your thoughts Medium level: Shortened pseudo-code Low level: A formerly called "high-level" language like Haskell or Java
So we're then two levels higher than before. ^^
Cheers
On Fri, Jul 3, 2015 at 7:19 PM, Dan Burton
wrote: It can, if you know the correct magic incantations to give ghc (which I don't, but I know the knowledge is out there). The phrase to Google is "reading ghc core", where "core" refers to an intermediate language that still resembles Haskell.
On Friday, July 3, 2015, Stefan Reich < stefan.reich.maker.of.eye@googlemail.com> wrote:
Here's a general question: Can the output of the Haskell compiler be inspected in some - readable - way?
Stefan
On Fri, Jul 3, 2015 at 10:18 AM, Baojun Wang
wrote: Hi,
First of all, I found it interesting that
loopM_ f k n s = when (k <= n) (f k >> loopM_ f (s+k) n s)
loopM_ seems faster than mapM_ ( mapM_ f [k, k+s..n]))
I think mapM_ is used very commonly, why it's performance is even lower than a hand-written loop function?
2nd, even I replace mapM_ with loopM_ from above, when chain IO action, it still can leak space. ( Because IO Monad (>>) need keep ``RealWorld s'' updated so that I/O actions can be done in-order? )
Consider below function:
f3 :: UArray Int Int -> IOUArray Int Int64 -> Int -> IO () f3 u r i = let !v = u ! i in go (f31 v) i i where f31 v j = readArray r j >>= \v1 -> writeArray r j (v1 + (fromIntegral i) * (fromIntegral v)) f31 :: Int -> Int -> IO () go g k s = when (k <= maxn) ( g k >> go g (s+k) s )
When call f3:
loopM_ (f3 uu res) 1 1 1000000
Which will have blow profiling output:
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
... loopM_ Main 104 4000002 7.4 10.1 100.0 99.3 f3 Main 113 1000000 1.0 2.0 70.2 69.1 f3.go Main 116 14970034 32.7 67.1 68.8 67.1 f3.f31 Main 117 13970034 34.5 0.0 36.1 0.0 f3.f31.\ Main 118 13970034 1.7 0.0 1.7 0.0 f3.f31 Main 114 0 0.3 0.0 0.4 0.0 f3.f31.\ Main 115 0 0.1 0.0 0.1 0.0 ... Why f3.go consumes so much space (67.1%)? The only reason I can think of is IO Monad chain (>>) isn't space free as I thought.
Did I get something fundamentally wrong?
Thanks baojun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- -- Dan Burton
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Fri, Jul 03, 2015 at 06:00:48PM +0000, Baojun Wang wrote:
My concerns is ~80% alloc happens in f3, but both array is allocated by newArray? Since I'm using unboxed array I'm not expecting this kind of laziness.
I see. You seem to be asking why it is allocating *at all*. That's not a phenomenon that normally falls under the terminology of "space leak", hence my confusion. I'm not an expert at reading Core, but I spent some time looking at it without seeing anywhere that would obviously allocate a lot. Eventually I decided to supply the profiling options to ghc-core and lo and behold the output changed dramatically! Unless I am very much misunderstanding how cost centre annotations work, the amount of allocation without -auto-all is only 10% of the amount with it. Please check and let me know if you concur that it is the profiling itself that is the root cause of all the allocation! Tom

Appreciate for the prompt reply, sorry for that I assumed the allocations was caused by space leak. I guess profiling with -auto-all really has huge impact on allocations. without profiling, haskell version runs about 50% slower than equivalent C version (both with -O2). Maybe that's as much as one could expect. On Fri, Jul 3, 2015 at 2:41 PM Tom Ellis < tom-lists-haskell-cafe-2013@jaguarpaw.co.uk> wrote:
On Fri, Jul 03, 2015 at 06:00:48PM +0000, Baojun Wang wrote:
My concerns is ~80% alloc happens in f3, but both array is allocated by newArray? Since I'm using unboxed array I'm not expecting this kind of laziness.
I see. You seem to be asking why it is allocating *at all*. That's not a phenomenon that normally falls under the terminology of "space leak", hence my confusion.
I'm not an expert at reading Core, but I spent some time looking at it without seeing anywhere that would obviously allocate a lot. Eventually I decided to supply the profiling options to ghc-core and lo and behold the output changed dramatically!
Unless I am very much misunderstanding how cost centre annotations work, the amount of allocation without -auto-all is only 10% of the amount with it. Please check and let me know if you concur that it is the profiling itself that is the root cause of all the allocation!
Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Sat, Jul 04, 2015 at 06:17:06AM +0000, Baojun Wang wrote:
Appreciate for the prompt reply, sorry for that I assumed the allocations was caused by space leak. I guess profiling with -auto-all really has huge impact on allocations. without profiling, haskell version runs about 50% slower than equivalent C version (both with -O2). Maybe that's as much as one could expect.
50% sounds like a good start, but I bet with some time and effort you can substantially improve on it. Specifically, the code is spending *loads* of time doing bounds checking. It checks on every array read and write. I suggest (for an experiment, not production code!) you replace the reads and writes with unsafe versions and see what speedup that gives you. I don't know how to do unsafe reads and writes with the array library, but for vectors the API is here: http://hackage.haskell.org/package/vector-0.10.12.3/docs/Data-Vector-Mutable... Tom

On Fri, Jul 3, 2015 at 1:11 PM, Stefan Reich < stefan.reich.maker.of.eye@googlemail.com> wrote:
Here's a general question: Can the output of the Haskell compiler be inspected in some - readable - way?
ghc -ddump-simpl (meaning output of the simplifier pass, commonly known as Core) There at least used to be a package "ghc-core" which would do this and syntax highlight the result, etc. to help with reading it. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
participants (6)
-
Baojun Wang
-
Brandon Allbery
-
Clinton Mead
-
Dan Burton
-
Stefan Reich
-
Tom Ellis