On Fri, Aug 7, 2009 at 12:42 PM, Thomas Witzel <witzel.thomas@gmail.com> wrote:

I have not yet been able to get the dev version compiled, but I'm
looking into it (basically compiles to 99% and then stops because of a
missing or defective .mk file). Anyhow, I played more with the release
version and one thing I noticed is that the GC time increases almost
linearly with the number of cores. Is there somewhere a document that
explains the overall architecture of the GHC/RTS and how in especially
the GC acts in concurrent situations ?

As for your comment regarding the padding the data-structures, I'll of
course also need to control the alignment in such a case. Is there
such a thing as explicit dynamic memory allocation in Haskell ?

Thanks, Thomas

On Wed, Aug 5, 2009 at 3:16 PM, Sebastian

Sylvan<sebastian.sylvan@gmail.com> wrote:
> GHC doesn't have per-thread allocation so it's probably a bit tricky to get
> that working. Plus, for parallelism it's not clear that a piece of data is
> necessarily "owned" by one thead, since it could be produced by a spark and
> consumed by another spark, those two independent sparks may not necessarily
> occupy the same thread, which means that any *other* data accessed by the
> firs thread could thrash the cache. So really you'd need per-spark
> allocation areas which would probably make sparks very heavy weight.
> In other words, I think there's plenty of research that needs to be done
> w.r.t. scheduling things in time and space so as to avoid false sharing. You
> could, of course, always chunk your work manually, and make sure that each
> "chunk" works on a big block that won't share cache lines with anything else
> (e.g. by padding the data structures).
> Also, while GHC does a fair bit of mutation on its own internal data (thunks
> etc.), most of the "user data" is read-only, which should help. I.e. once a
> cache line has been filled up, there won't be any synchronisation needed on
> that data again.
>
> On Wed, Aug 5, 2009 at 8:04 PM, Thomas Witzel <witzel.thomas@gmail.com>
> wrote:
>>
>> I'll try that. I'd like to stick with it. As for the memory, although
>> its probably quite a bit of work, it should be doable to have code
>> generated where the threads have their own, non-overlapping, memory
>> pages, so that the CPUs don't go into a cache-thrashing death-match.
>> I'll spend some more time with Haskell and then go from there.
>>
>> On Wed, Aug 5, 2009 at 3:01 PM, Sebastian
>> Sylvan<sebastian.sylvan@gmail.com> wrote:
>> >
>> >
>> > On Wed, Aug 5, 2009 at 6:59 PM, Thomas Witzel <witzel.thomas@gmail.com>
>> > wrote:
>> >>
>> >> 2. I started with the very simple nfib example given in the manual for
>> >> Control.Parallel (Section 7.18). On my systems using multiple cores
>> >> makes the code actually slower than just using a single core. While
>> >> the manual cautions that this could be the case for certain
>> >> algorithms, I'm wondering whether this is the desired behaviour for
>> >> this example.
>> >>
>> >> I'm using ghc 6.10.4 right now.
>> >
>> > IIRC the development version of GHC has some major work to optimize
>> > concurrency, so it may be worth trying that. In particular I believe it
>> > executes sparks in batches, to reduce the overhead (which hopefully
>> > fixes
>> > your issue).
>> >
>> > --
>> > Sebastian Sylvan
>> >
>
>
>
> --
> Sebastian Sylvan
>

--
Sebastian Sylvan