(Sorry for the double post)
A good practice would be to always factorize your list treatments [*] (using a single fold to compute in the same time max & min), and be sure never to keep a plain reference to the list and embed that list into a function that returns it (to enforce its evaluation each time you need to access it).
But I can't think of a way to check that property (list never accessed twice) statically in Haskell.

[*] But wouldn't GHC do in that specific case do something about that with -O2 activated?

2012/3/10 Yves Parès <yves.pares@gmail.com>
I'm afraid without uniqueness or linear typing it would be hard to avoid.


2012/3/10 Thiago Negri <evohunz@gmail.com>
I see. Thanks for the answers.

Any data structure or source annotation that would prevent that?

For example, if I try the same program to run on a
[1..9999999999999999] list, I'll get an out of memory error for the
single-threaded version. Any way to prevent it without declaring two
different versions of "list"?


2012/3/10 Anthony Cowley <acowley@gmail.com>:
> From that profiling data, I think you're just seeing a decrease in sharing. With one thread, you create the list structure in memory: the first fold could consume it in-place, but the second fold is still waiting for its turn.  The list is built on the heap so the two folds can both refer to the same list.
>
> With two threads, GHC is being clever and inlining the definition you give for list, which is then optimized into two parallel loops. No list on the heap means there's not much for the GC to do.
>
> Sharing of index lists like this is a common source of problems. In particular, nested loops can make it even trickier to prevent sharing as there may not be an opportunity for parallel evaluation.
>
> Anthony
>
> On Mar 10, 2012, at 10:21 AM, Thiago Negri <evohunz@gmail.com> wrote:
>
>> Hi all.
>>
>> I wrote a very simple program to try out parallel Haskel and check how
>> it would look like to make use of more than one core in this language.
>>
>> When I tried the program with RTS option -N1, total time shows it took
>> 2.48 seconds to complete and around 65% of that time was taken by GC.
>>
>> Then I tried the same program with RTS options -N2 and total time
>> decreased to 1.15 seconds as I expected a gain here. But what I didn't
>> expect is the GC time to drop to 0%.
>>
>> I guess I'm having trouble to understand the output of the RTS option -s.
>> Can you enlighten me?
>>
>>
>> The source for the testing program:
>>
>>> module Main where
>>>
>>> import Data.List (foldl1')
>>> import Control.Parallel (par, pseq)
>>> import Control.Arrow ((&&&))
>>>
>>> f `parApp` (a, b) = a `par` (b `pseq` (f a b))
>>> seqApp = uncurry
>>>
>>> main = print result
>>>  where result = (+) `parApp` minMax list
>>>        minMax = minlist &&& maxlist
>>>        minlist = foldl1' min
>>>        maxlist = foldl1' max
>>>        list = [1..19999999]
>>
>>
>> The results on a Windows 7 64bits with an Intel Core 2 Duo, compiled
>> with GHC from Haskell Platform:
>>
>> c:\tmp\hs>par +RTS -s -N1
>> par +RTS -s -N1
>> 20000000
>>     803,186,152 bytes allocated in the heap
>>     859,916,960 bytes copied during GC
>>     233,465,740 bytes maximum residency (10 sample(s))
>>      30,065,860 bytes maximum slop
>>             483 MB total memory in use (0 MB lost due to fragmentation)
>>
>>  Generation 0:  1523 collections,     0 parallel,  0.80s,  0.75s elapsed
>>  Generation 1:    10 collections,     0 parallel,  0.83s,  0.99s elapsed
>>
>>  Parallel GC work balance: nan (0 / 0, ideal 1)
>>
>>                        MUT time (elapsed)       GC time  (elapsed)
>>  Task  0 (worker) :    0.00s    (  0.90s)       0.00s    (  0.06s)
>>  Task  1 (worker) :    0.00s    (  0.90s)       0.00s    (  0.00s)
>>  Task  2 (bound)  :    0.86s    (  0.90s)       1.62s    (  1.69s)
>>
>>  SPARKS: 1 (0 converted, 0 pruned)
>>
>>  INIT  time    0.00s  (  0.00s elapsed)
>>  MUT   time    0.86s  (  0.90s elapsed)
>>  GC    time    1.62s  (  1.74s elapsed)
>>  EXIT  time    0.00s  (  0.00s elapsed)
>>  Total time    2.48s  (  2.65s elapsed)
>>
>>  %GC time      65.4%  (65.9% elapsed)
>>
>>  Alloc rate    936,110,032 bytes per MUT second
>>
>>  Productivity  34.6% of total user, 32.4% of total elapsed
>>
>> gc_alloc_block_sync: 0
>> whitehole_spin: 0
>> gen[0].sync_large_objects: 0
>> gen[1].sync_large_objects: 0
>>
>>
>> c:\tmp\hs>par +RTS -s -N2
>> par +RTS -s -N2
>> 20000000
>>   1,606,279,644 bytes allocated in the heap
>>          74,924 bytes copied during GC
>>          28,340 bytes maximum residency (1 sample(s))
>>          29,004 bytes maximum slop
>>               2 MB total memory in use (0 MB lost due to fragmentation)
>>
>>  Generation 0:  1566 collections,  1565 parallel,  0.00s,  0.01s elapsed
>>  Generation 1:     1 collections,     1 parallel,  0.00s,  0.00s elapsed
>>
>>  Parallel GC work balance: 1.78 (15495 / 8703, ideal 2)
>>
>>                        MUT time (elapsed)       GC time  (elapsed)
>>  Task  0 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
>>  Task  1 (worker) :    0.58s    (  0.59s)       0.00s    (  0.01s)
>>  Task  2 (bound)  :    0.58s    (  0.59s)       0.00s    (  0.00s)
>>  Task  3 (worker) :    0.00s    (  0.59s)       0.00s    (  0.00s)
>>
>>  SPARKS: 1 (1 converted, 0 pruned)
>>
>>  INIT  time    0.00s  (  0.00s elapsed)
>>  MUT   time    1.15s  (  0.59s elapsed)
>>  GC    time    0.00s  (  0.01s elapsed)
>>  EXIT  time    0.00s  (  0.00s elapsed)
>>  Total time    1.15s  (  0.61s elapsed)
>>
>>  %GC time       0.0%  (2.4% elapsed)
>>
>>  Alloc rate    1,391,432,695 bytes per MUT second
>>
>>  Productivity 100.0% of total user, 190.3% of total elapsed
>>
>> gc_alloc_block_sync: 90
>> whitehole_spin: 0
>> gen[0].sync_large_objects: 0
>> gen[1].sync_large_objects: 0
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe