[Rejected Paper] Experience Report: Writing NetBSD Sound Drivers in Haskell

Hi jhc-hackers, I have written a paper "Experience Report: Writing NetBSD Sound Drivers in Haskell". http://metasepi.org/papers.html It explains Ajhc customized jhc GC, and sometime useful to develop jhc. Thank's, -- Kiwamu Okabe at METASEPI DESIGN

Oooh. and you documented jhc's RTS while you were at it. that is great :) I have been thinking about a way to extend JGC to seamlessly handle interfaces that utilize stack allocated C structs. The main target being GMP. Right now I can do pretty well by having a 'self-pointing' 'self-cleaning' ForeignPtr. The definition of ForeignPtr is
data ForeginPtr = ForeignPtr Addr_
However, it is a little magic in that you can allocate it in a larger
space than it would naturally take up, I can then have Addr_ point to
the word following it directly in memory. The garbage collector treats
it just like normal and needs no finalizer, as long as the ForeginPtr
is live, the area it points to is live since they are the same space,
as far as the code is concerned it could be pointing to an external C
structure. This is very efficient and a very fast way to throw around
C structures without having to worry about whether they were allocated
in haskell or in C.
However, things like GMP require the memory region to be initialized
and freed since it may have internal pointers. I could just continue
with standard foreign pointers, attaching a destructor, but this has a
couple problems
- I have to initialize the memory area, this is hard to do without
invoking the IO monad to ensure proper sequencing, for the result of
an addition, that seems heavy, it is hard for the complier to "see
through" an unsafePerformIO when optimizing.
- every Integer will have to carry around two extra words, a self
pointer that always points to its own memory location, and a pointer
to a destructor that is always going to be the same.
- memory will be destructed when it is likely to immediately be
re-used as an Integer, it would be good to deforest this
destruct-construct pair.
To solve both I was thinking of assosciating a contructor/destructor
with a type rather than a value. By creating an entire block of the
same type, it can initialize the entire block at once, then delay the
destructor until the entire block is freed. since GMP ints can be
re-used in place, it would get rid of almost all initialization and
destruction overhead. A 'delayed destructor' if you will, that only
needs to be called if the memory location is going to be used for a
different type. Allocations in jhc are already tagged by type so this
isn't difficult to keep track of. I was thinking something like
data
{-# CCONSTRUCTOR "init_integer" #-}
{-# CDESTRUCTOR delayed "fini_integer" #-}
Integer_ :: #
data Integer = Integer Integer_
since Integer_ is unboxed, we can ensure they are only created in the
right heap by using a primitive to do so, there is no way for a user
to conjure up an unboxed type that doesn't take part in unboxed num
polymorphism so can be represented by 0# 1# etc.
John
On Tue, Jun 10, 2014 at 3:13 AM, Kiwamu Okabe
Hi jhc-hackers,
I have written a paper "Experience Report: Writing NetBSD Sound Drivers in Haskell".
http://metasepi.org/papers.html
It explains Ajhc customized jhc GC, and sometime useful to develop jhc.
Thank's, -- Kiwamu Okabe at METASEPI DESIGN _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- John Meacham - http://notanumber.net/

Hi John,
On Tue, Jun 10, 2014 at 7:55 PM, John Meacham
Oooh. and you documented jhc's RTS while you were at it. that is great :)
BTW. How do you think about jgc having multiple "arena" with the context? It can realize reentrant GC on jhc. Of cause, I know you feel bad about the cost initializing jgc every call C=>Haskell. Do you have some idea for the multiple "arena"? Best regards, -- Kiwamu Okabe at METASEPI DESIGN

Hmm.. what were you thinking of in terms of how it would change the API?
By reentrant do you mean you want C functions that were called by
haskell to be able to call back into haskell again? what is the issue
with the context that is stowed. or are you talking about
SMP/lightweight threads?
On Tue, Jun 10, 2014 at 4:19 AM, Kiwamu Okabe
Hi John,
On Tue, Jun 10, 2014 at 7:55 PM, John Meacham
wrote: Oooh. and you documented jhc's RTS while you were at it. that is great :)
BTW. How do you think about jgc having multiple "arena" with the context? It can realize reentrant GC on jhc. Of cause, I know you feel bad about the cost initializing jgc every call C=>Haskell.
Do you have some idea for the multiple "arena"?
Best regards, -- Kiwamu Okabe at METASEPI DESIGN
-- John Meacham - http://notanumber.net/

Ah. I think I see what you mean by reentrant in your paper. Can you
point me to your context switching code in ajhc?
Is SMP a concern for you or are you mainly concerned about hardware interrupts?
On Tue, Jun 10, 2014 at 4:26 AM, John Meacham
Hmm.. what were you thinking of in terms of how it would change the API?
By reentrant do you mean you want C functions that were called by haskell to be able to call back into haskell again? what is the issue with the context that is stowed. or are you talking about SMP/lightweight threads?
On Tue, Jun 10, 2014 at 4:19 AM, Kiwamu Okabe
wrote: Hi John,
On Tue, Jun 10, 2014 at 7:55 PM, John Meacham
wrote: Oooh. and you documented jhc's RTS while you were at it. that is great :)
BTW. How do you think about jgc having multiple "arena" with the context? It can realize reentrant GC on jhc. Of cause, I know you feel bad about the cost initializing jgc every call C=>Haskell.
Do you have some idea for the multiple "arena"?
Best regards, -- Kiwamu Okabe at METASEPI DESIGN
-- John Meacham - http://notanumber.net/
-- John Meacham - http://notanumber.net/

Hi John,
On Tue, Jun 10, 2014 at 8:28 PM, John Meacham
Ah. I think I see what you mean by reentrant in your paper. Can you point me to your context switching code in ajhc?
Here is. https://github.com/ajhc/ajhc/blob/arafura/rts/rts/conc.c#L33 It's a sample with pthread. But CLHs can choose any thread style with calling C code that generate context switch.
Is SMP a concern for you or are you mainly concerned about hardware interrupts?
Both thread on SMP and interrupt. Former uses active context switch, and the example is the above. Later uses passive context switch, however the interrupt context begins on C context. The C context create new "arena" when calling C => Haskell. Thank's, -- Kiwamu Okabe at METASEPI DESIGN

Hmm... if we allocate the gc_stack on an aligned boundry, can we
recover the arena by keeping a pointer at its base? sort of like I
recover the cache block pointer from an arbitrary heap location by
rounding down to the block boundry.
The main issue would be how it affects allocation speed, its okay to
make the GC slower as long as allocation is still fast, Before
pre-populating the cache pointers sped things up considerably, how
would it make sure to use one from the current arena without slowing
down allocation in general?
John
On Tue, Jun 10, 2014 at 4:42 AM, Kiwamu Okabe
Hi John,
On Tue, Jun 10, 2014 at 8:28 PM, John Meacham
wrote: Ah. I think I see what you mean by reentrant in your paper. Can you point me to your context switching code in ajhc?
Here is.
https://github.com/ajhc/ajhc/blob/arafura/rts/rts/conc.c#L33
It's a sample with pthread. But CLHs can choose any thread style with calling C code that generate context switch.
Is SMP a concern for you or are you mainly concerned about hardware interrupts?
Both thread on SMP and interrupt. Former uses active context switch, and the example is the above. Later uses passive context switch, however the interrupt context begins on C context. The C context create new "arena" when calling C => Haskell.
Thank's, -- Kiwamu Okabe at METASEPI DESIGN
-- John Meacham - http://notanumber.net/

Hi John,
On Tue, Jun 10, 2014 at 8:49 PM, John Meacham
The main issue would be how it affects allocation speed, its okay to make the GC slower as long as allocation is still fast, Before pre-populating the cache pointers sped things up considerably, how would it make sure to use one from the current arena without slowing down allocation in general?
So I don't have any benchmark for it today. I worry about the cost initializing arena when C=>Haskell. Current jgc has no cost, but my jgc initializes arena when C=>Haskell everytime. Please imagine the cost call all of find_cache(). Regards, -- Kiwamu Okabe at METASEPI DESIGN

Yeah, find_cache is fairly slow. In fact, just checking if it is NULL
noticibly slows things down.
So, something that could be done is generate a struct with each cache
used as offsets in it, basically putting the entire generate s_cache
table in a struct then initializing them all when the arena is
allocated. that would add a single redirect thruogh the arena to the
caches which might not be too bad...
what would be better is to use a thread or processor local register.
John
On Tue, Jun 10, 2014 at 4:58 AM, Kiwamu Okabe
Hi John,
On Tue, Jun 10, 2014 at 8:49 PM, John Meacham
wrote: The main issue would be how it affects allocation speed, its okay to make the GC slower as long as allocation is still fast, Before pre-populating the cache pointers sped things up considerably, how would it make sure to use one from the current arena without slowing down allocation in general?
So I don't have any benchmark for it today. I worry about the cost initializing arena when C=>Haskell. Current jgc has no cost, but my jgc initializes arena when C=>Haskell everytime. Please imagine the cost call all of find_cache().
Regards, -- Kiwamu Okabe at METASEPI DESIGN
-- John Meacham - http://notanumber.net/

Hmm.. well in any case, collecting the whole context into a handy
struct is a good cleanup anyway, even if there is just a single global
one. So I should backport that as well as the pthreads code.
On Tue, Jun 10, 2014 at 5:07 AM, John Meacham
Yeah, find_cache is fairly slow. In fact, just checking if it is NULL noticibly slows things down.
So, something that could be done is generate a struct with each cache used as offsets in it, basically putting the entire generate s_cache table in a struct then initializing them all when the arena is allocated. that would add a single redirect thruogh the arena to the caches which might not be too bad...
what would be better is to use a thread or processor local register.
John
On Tue, Jun 10, 2014 at 4:58 AM, Kiwamu Okabe
wrote: Hi John,
On Tue, Jun 10, 2014 at 8:49 PM, John Meacham
wrote: The main issue would be how it affects allocation speed, its okay to make the GC slower as long as allocation is still fast, Before pre-populating the cache pointers sped things up considerably, how would it make sure to use one from the current arena without slowing down allocation in general?
So I don't have any benchmark for it today. I worry about the cost initializing arena when C=>Haskell. Current jgc has no cost, but my jgc initializes arena when C=>Haskell everytime. Please imagine the cost call all of find_cache().
Regards, -- Kiwamu Okabe at METASEPI DESIGN
-- John Meacham - http://notanumber.net/
-- John Meacham - http://notanumber.net/

Hi John,
Thank's for your advice.
On Tue, Jun 10, 2014 at 11:16 PM, John Meacham
Hmm.. well in any case, collecting the whole context into a handy struct is a good cleanup anyway, even if there is just a single global one. So I should backport that as well as the pthreads code.
I think strongly depending on pthread is bad idea, because it will destroy jhc's minimalism. Ajhc is result that is chosen with the design of selectable thread arch, but slow than jhc... I should think more and more to merge CLHs into jhc... Thank's, -- Kiwamu Okabe at METASEPI DESIGN

Am 10.06.2014 12:13, schrieb Kiwamu Okabe:
Hi jhc-hackers,
I have written a paper "Experience Report: Writing NetBSD Sound Drivers in Haskell".
Since you are concerned with low-level Haskell programming, what do you think about the Reduceron project? I also wondered whether it would be possible to teach functional programming to processors with customizable machine code like the Transmeta processors. I don't know whether comparable projects are still alive.

Hi Henning,
On Wed, Jun 11, 2014 at 6:33 AM, Henning Thielemann
Since you are concerned with low-level Haskell programming, what do you think about the Reduceron project? I also wondered whether it would be possible to teach functional programming to processors with customizable machine code like the Transmeta processors. I don't know whether comparable projects are still alive.
http://www.cs.york.ac.uk/fp/reduceron/ Oh, I hasn't know it. Thank's. I am not good to talk about HDL. However,a part of jhc's runtime can be designed with HDL, perhaps. Main part of jhc's runtime is GC. The GC clears bit marking array before marking. The clearing can be executed by HDL in parallel. Regards, -- Kiwamu Okabe at METASEPI DESIGN
participants (3)
-
Henning Thielemann
-
John Meacham
-
Kiwamu Okabe