Bringing Erlang to Haskell

Folks, I love the Erlang multi-processing experience and think that a lot of the mistakes that I made could be avoided. What I want to have is 1) Processes, aka threads with single-slot in/out mailboxes 2) A facility to keep a list of such processes and send events to them using their process id 3) A socket reader/writer abstraction that communicates with the outside world using using its mailboxes Probably some other things but I would start with the above. I also want to use STM for this. One particular thing that bugs me is that I cannot really use TChan for thread mailboxes. I don't think I experienced this problem with Erlang but using a TChan with a logger thread quickly overwhelms the logger and fills the TChan and a lot (hundreds? thousands) of other threads are logging to it. Someone said it's because the scheduler would give ther other threads proportionally more attention. I found single-slot mailboxes (TMVar) to work much better as they pace the overall message flow. Using them means that asynchronous messages cannot be implemented, though. Please correct me if I'm wrong. I'll update my blog as I move forward. Thanks, Joel -- http://wagerlabs.com/

Just to elaborate a bit... On Dec 12, 2005, at 4:00 PM, Joel Reymont wrote:
1) Processes, aka threads with single-slot in/out mailboxes
It's obvious that the thread id and the mailboxes (two TMVar's) would need to be kept together, i.e. a) data Process = ThreadId (TMVar Dynamic) (TMVar Dynamic) or b) data Process a = ThreadId (TMVar a) (TMVar a) but I'm not sure if I should create a process class, for example. I think a class is not really needed since a message "receive" function can just use myThreadId to obtain a key into the map of processes and retrieve the message from its own mailbox. receive could take an action that's parameterized on the event and could pattern-match on it to process the event. This is why I would much rather prefer (b) above. I foresee myself having a single Event type throughout my app with various constructors. Processes can be supervised by rethrowing exceptions back to the parent thread I think.
2) A facility to keep a list of such processes and send events to them using their process id
I'm looking to use a global variable for this but then events would either need to be Dynamic or great care would need to be excercised with unsafeIO (Tomasz? ;-)) I still think this is the way to tackle the issue (note Process a). {-# NOINLINE children #-} children :: MVar [Process a] children = unsafePerformIO $ newMVar [] So long as everything is parameterized throughout, my experience is that the type checker complains if you try to use two types of 'a' throughout the app.
3) A socket reader/writer abstraction that communicates with the outside world using using its mailboxes
What I would like to do here is to use STM to read either from the socket loop mailbox or from the socket itself and then dispatch accordingly. If the message came from the socket then it's posted to the outbox and if a message was retrieved from the inbox then it gets posted to the socket. Some sort of a timeout would be needed for blocking I/O or a combination of hWaitForInput and -threaded. Thanks, Joel P.S. Einar apparently has distributed transactions almost ready. Putting them together with a polished version of the above should make distributed concurrent programming in Haskell far less tricky. -- http://wagerlabs.com/

BTW, there has already been some work in this area. http://www-i2.informatik.rwth-aachen.de/~stolz/dhs/ http://www.informatik.uni-kiel.de/~fhu/PUBLICATIONS/1999/ifl.html

On 12/12/05, Joel Reymont
Folks,
I love the Erlang multi-processing experience and think that a lot of the mistakes that I made could be avoided. What I want to have is
1) Processes, aka threads with single-slot in/out mailboxes 2) A facility to keep a list of such processes and send events to them using their process id 3) A socket reader/writer abstraction that communicates with the outside world using using its mailboxes
Probably some other things but I would start with the above. I also want to use STM for this.
One particular thing that bugs me is that I cannot really use TChan for thread mailboxes. I don't think I experienced this problem with Erlang but using a TChan with a logger thread quickly overwhelms the logger and fills the TChan and a lot (hundreds? thousands) of other threads are logging to it. Someone said it's because the scheduler would give ther other threads proportionally more attention.
You could use a bounded TChan. Chans are good because the smooth out "noise".. Ie a sudden surge of messages can be handled without stalling the senders, but sustained heavy traffic will cause a stall (preventing the TChan from growing too big). Here's an untested, off-the-top-of-my-head, implementation.. I may have gotten some names wrong but it should be quite straightforward to write a "real" implementation.. -- may want a newtype here? type BoundedTChan a = (TVar Int, Int, TChan a) -- (current size, max, chan) newBoundedTChan n = do sz <- newTVar 0 ch <- newTChan return (sz,n,ch) writeBoundedTChan (sz,mx,ch) x = do s <- readTVar sz when (s >= mx) retry writeTVar sz (s+1) writeChan ch x readBoundedTChan (sz,mx,ch) = do modifyTVar sz (-1) readTChan ch (s+1) /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Hello Joel, Monday, December 12, 2005, 7:00:46 PM, you wrote: JR> 1) Processes, aka threads with single-slot in/out mailboxes are you read dewscription of my own Process library in haskell maillist? JR> One particular thing that bugs me is that I cannot really use TChan JR> for thread mailboxes. i use. but i limit number of messages in this channel by additional tools. you can easily do the same. but first ask yourself - what you will gain by this? imho, it will only help to smooth temporary speed changes. if you just want to test whether this can speed up your program - implement such limited Channel and test whether it works btw, i suggested you to try not using logging thread entirely, making all logging actions synchronously JR> I found single-slot mailboxes (TMVar) to work much better as they JR> pace the overall message flow. Using them means that asynchronous JR> messages cannot be implemented, though. not exactly. they can hold at most one message i think that your aspiration to make things asynchronous is just sort of fashion. what you really want to get? -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Dec 13, 2005, at 1:13 AM, Bulat Ziganshin wrote:
are you read dewscription of my own Process library in haskell maillist?
No. Can you give me a pointer?
btw, i suggested you to try not using logging thread entirely, making all logging actions synchronously
I cannot. Only one thread can use stdout, otherwise the output is garbled. Plus, combing through a few thousand individual files produced by the threads would be a pain.
i think that your aspiration to make things asynchronous is just sort of fashion. what you really want to get?
I have no such aspirations. I was just making a point. Joel -- http://wagerlabs.com/

Hello Joel, Tuesday, December 13, 2005, 1:05:10 PM, you wrote:
are you read dewscription of my own Process library in haskell maillist?
JR> No. Can you give me a pointer? i will forward it to you. it have meaning to be subcribed there, just to see interesting announcements
btw, i suggested you to try not using logging thread entirely, making all logging actions synchronously
JR> I cannot. Only one thread can use stdout, otherwise the output is JR> garbled. Plus, combing through a few thousand individual files JR> produced by the threads would be a pain. :))) import Control.Concurrent import Control.Monad import System.IO import System.IO.Unsafe main = do h <- openBinaryFile "test" WriteMode for [1..100] $ \n -> forkIO $ for [1..] $ \i -> logger h ("thread "++show n++" msg "++show i) getLine hClose h lock = unsafePerformIO$ newMVar () logger h msg = withMVar lock $ const$ do hPutStrLn h msg putStrLn msg for = flip mapM_ -- Best regards, Bulat mailto:bulatz@HotPOP.com

Thank you Bulat, makes total sense. This list is a treasure trove of a resource. I guess this is what happens when you go from Erlang to Haskell :-). I'm conditioned to think of everything as a process and uses processes for everything. On Dec 13, 2005, at 11:17 AM, Bulat Ziganshin wrote:
import Control.Concurrent import Control.Monad import System.IO import System.IO.Unsafe
main = do h <- openBinaryFile "test" WriteMode for [1..100] $ \n -> forkIO $ for [1..] $ \i -> logger h ("thread "++show n++" msg "++show i) getLine hClose h
lock = unsafePerformIO$ newMVar ()
logger h msg = withMVar lock $ const$ do hPutStrLn h msg putStrLn msg
for = flip mapM_

On Mon, Dec 12, 2005 at 04:00:46PM +0000, Joel Reymont wrote:
One particular thing that bugs me is that I cannot really use TChan for thread mailboxes. I don't think I experienced this problem with Erlang but using a TChan with a logger thread quickly overwhelms the logger and fills the TChan and a lot (hundreds? thousands) of other threads are logging to it.
I wonder what Erlang does to solve this problem? Perhaps we should track the number of unprocessed messages in TChans and the bigger it is the more favor consumers over producers. Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

Hello Tomasz, Tuesday, December 13, 2005, 12:49:04 PM, you wrote: TZ> On Mon, Dec 12, 2005 at 04:00:46PM +0000, Joel Reymont wrote:
One particular thing that bugs me is that I cannot really use TChan for thread mailboxes. I don't think I experienced this problem with Erlang but using a TChan with a logger thread quickly overwhelms the logger and fills the TChan and a lot (hundreds? thousands) of other threads are logging to it.
TZ> I wonder what Erlang does to solve this problem? Perhaps we should track TZ> the number of unprocessed messages in TChans and the bigger it is TZ> the more favor consumers over producers. even best - always prefer consumer to producer :) may be have two lists - one of threads waiting to consume, and one of threads waiting to produce? -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Dec 13, 2005, at 9:49 AM, Tomasz Zielonka wrote:
On Mon, Dec 12, 2005 at 04:00:46PM +0000, Joel Reymont wrote:
One particular thing that bugs me is that I cannot really use TChan for thread mailboxes. I don't think I experienced this problem with Erlang but using a TChan with a logger thread quickly overwhelms the logger and fills the TChan and a lot (hundreds? thousands) of other threads are logging to it.
I wonder what Erlang does to solve this problem? Perhaps we should track the number of unprocessed messages in TChans and the bigger it is the more favor consumers over producers.
It sounds to me that introducing thread priorities would be key. Here's a reply from Ulf Wiger (and Erlang expert): Erlang has four process priorities: - 'max' and 'high' are strict priorities (you stay at that level while there are processes ready to run) - normal and low are scheduled "fairly", I believe with 8000 "reductions" (roughly function calls) at normal priority and one low-priority job (if such a job exists) The scheduler works on reduction count. A context switch happens if the current process would block (e.g. if it's in a receive statement and there is no matching message in the queue), or when it's executed 1000 reductions. Since not all operations are equal in cost, there is an internal function called erlang:bump_reductions(N). File operations are usually followed by a call to erlang:bump_reductions(100) (see prim_file:get_drv_response/1) which means that processes writing to disk run out of their time slice in fewer function calls. This is of course to keep them from getting an unfair amount of CPU time. A logging process would probably therefore do well to fetch all messages in the message queue and write them using disk_log:alog_terms/2 (logging multiple messages each time). One could also possibly run the logger process at high priority. This means that normal priority process will have a hard time starving it. If the disk is slow, the logger process will yield while waiting for the disk (which won't block the runtime system as long as you have the thread pool enabled). In general, I think that processes that mainly dispatch messages, and don't generate any work on their own, should usually run on high priority. Otherwise, they tend to just contribute to delays in the system. /Uffe -- http://wagerlabs.com/
participants (5)
-
Bulat Ziganshin
-
Joel Reymont
-
Robert Dockins
-
Sebastian Sylvan
-
Tomasz Zielonka