Re: Questions on the RTS C API regarding threads and tasks

Il giorno 05 nov 2015, alle ore 05:27, Edward Z. Yang
Excerpts from Nicola Gigante's message of 2015-11-04 11:12:39 -0800:
I’ve started delving into the ghc runtime API to understand if everything we need is exposed to the world or if we have to modify the runtime itself (which I would avoid if possible).
I agree with this goal in principle, but:
(1) Depending on what you want to do, it may be quite difficult to do this, and
(2) The RTS is actually quite hackable (as long as you are not changing any of the interfaces with generated Haskell code, you could even build your own RTS and have users link against that.)
Something to keep in mind.
Edward, thanks for the quick reply! Yes, I see that the source code is clear and well documented! :) However, this project is something we would eventually want to publish on hackage, not something of internal use. This means we'd have to contribute the runtime changes upstream, which would be cool but I assumed GHC as a project doesn't accept every experimental new feature that come across, and we don't have the energy to guarantee maintenance of this eventual contribution. An external solution would be better.
Not easily. If you make a safe foreign call, the capability is given up before the C code executes, so you actually lose ownership of the capability and TSO by the time you're running C code. The only way to write a foreign call to properly suspend the thread that called it is a primop function (e.g. stg_putMVar), and the operation is quite delicate and you will probably want to use some existing (internal) code in the RTS to do it.
But if you just need to sleep/awake threads, why not just use an MVar?
We don't have measurements, but we ruled out this possibility for performance reasons. Our idea is to make a thin Haskell wrapper around a tiny bit of highly optimized C code. What's the performance of locking on MVars? While we are at it: are primops callable directly from C? I suppose calling conventions are different. A question comes to mind: you mentioned "safe" calls. Are unsafe calls different regarding the detaching of the capability? Also: would a patch to make this possible be accepted? In particular: - add a way to make a "ultraunsafe" foreign call that do not loose the ownership of the calling thread. - add a BlockedOnExplicitSleep flag for tso->why_blocked (And in turn this means managing a different blocking queue, right?) - export something to reliably put in sleep and resume threads in this way. Is this feasible? Would it be a good idea?
Edward
Thank you again, Greetings, Nicola

Ping?
Please tell me if my questions were unclear.
Thank you,
Nicola
Il giorno 05 nov 2015, alle ore 08:13, Nicola Gigante
Il giorno 05 nov 2015, alle ore 05:27, Edward Z. Yang
ha scritto: I agree with this goal in principle, but:
(1) Depending on what you want to do, it may be quite difficult to do this, and
(2) The RTS is actually quite hackable (as long as you are not changing any of the interfaces with generated Haskell code, you could even build your own RTS and have users link against that.)
Something to keep in mind.
Edward, thanks for the quick reply!
Yes, I see that the source code is clear and well documented! :) However, this project is something we would eventually want to publish on hackage, not something of internal use. This means we'd have to contribute the runtime changes upstream, which would be cool but I assumed GHC as a project doesn't accept every experimental new feature that come across, and we don't have the energy to guarantee maintenance of this eventual contribution. An external solution would be better.
Not easily. If you make a safe foreign call, the capability is given up before the C code executes, so you actually lose ownership of the capability and TSO by the time you're running C code. The only way to write a foreign call to properly suspend the thread that called it is a primop function (e.g. stg_putMVar), and the operation is quite delicate and you will probably want to use some existing (internal) code in the RTS to do it.
But if you just need to sleep/awake threads, why not just use an MVar?
We don't have measurements, but we ruled out this possibility for performance reasons. Our idea is to make a thin Haskell wrapper around a tiny bit of highly optimized C code. What's the performance of locking on MVars?
While we are at it: are primops callable directly from C? I suppose calling conventions are different.
A question comes to mind: you mentioned "safe" calls. Are unsafe calls different regarding the detaching of the capability?
Also: would a patch to make this possible be accepted? In particular: - add a way to make a "ultraunsafe" foreign call that do not loose the ownership of the calling thread. - add a BlockedOnExplicitSleep flag for tso->why_blocked (And in turn this means managing a different blocking queue, right?) - export something to reliably put in sleep and resume threads in this way.
Is this feasible? Would it be a good idea?

Excerpts from Nicola Gigante's message of 2015-11-04 23:13:51 -0800:
We don't have measurements, but we ruled out this possibility for performance reasons. Our idea is to make a thin Haskell wrapper around a tiny bit of highly optimized C code. What's the performance of locking on MVars?
I still don't know what it is you're trying to do. If you have a tiny bit of optimized C code that runs quickly, then you should just make an unsafe FFI call to it (as for as Haskell's runtime is concerned, it's just a "fat instruction").
While we are at it: are primops callable directly from C? I suppose calling conventions are different.
Anything is "callable" from C, but yes, you have to do the right calling convention. Primops are not easily callable from C.
A question comes to mind: you mentioned "safe" calls. Are unsafe calls different regarding the detaching of the capability?
An unsafe call does not detach the capability.
Also: would a patch to make this possible be accepted? In particular: - add a way to make a "ultraunsafe" foreign call that do not loose the ownership of the calling thread.
I don't see what the difference between this and an unsafe foreign call is.
- add a BlockedOnExplicitSleep flag for tso->why_blocked (And in turn this means managing a different blocking queue, right?) - export something to reliably put in sleep and resume threads in this way.
Is this feasible? Would it be a good idea?
I still don't see why you can't just block the thread on an MVar (removing it from the main run queues), and then when you want to resume it write to the MVar. It'll have an added bonus that you'll automatically handle masking/async exceptions correctly. If you find the MVar implementation is too slow, then maybe you can think about making an optimized implementation which doesn't use any synchronization / is inline in the TSO so no allocation is necessary. But in my opinion this is putting the cart before the horse. Edward

Il giorno 09 nov 2015, alle ore 00:17, Edward Z. Yang
ha scritto: Excerpts from Nicola Gigante's message of 2015-11-04 23:13:51 -0800:
We don't have measurements, but we ruled out this possibility for performance reasons. Our idea is to make a thin Haskell wrapper around a tiny bit of highly optimized C code. What's the performance of locking on MVars?
I still don't know what it is you're trying to do. If you have a tiny bit of optimized C code that runs quickly, then you should just make an unsafe FFI call to it (as for as Haskell's runtime is concerned, it's just a "fat instruction”).
I’m sorry, I know my description was too vague. The reason is that we have a blind review pending on a paper and I had instructions of not to talk about what we are doing. Anyway, your reply is nevertheless very clear.
While we are at it: are primops callable directly from C? I suppose calling conventions are different.
Anything is "callable" from C, but yes, you have to do the right calling convention. Primops are not easily callable from C.
Ok, that’s what I meant.
A question comes to mind: you mentioned "safe" calls. Are unsafe calls different regarding the detaching of the capability?
An unsafe call does not detach the capability.
Ok, thanks for the confirmation of this fact.
Also: would a patch to make this possible be accepted? In particular: - add a way to make a "ultraunsafe" foreign call that do not loose the ownership of the calling thread.
I don't see what the difference between this and an unsafe foreign call is.
Nothing actually, I was confused about what “unsafe” call means.
- add a BlockedOnExplicitSleep flag for tso->why_blocked (And in turn this means managing a different blocking queue, right?) - export something to reliably put in sleep and resume threads in this way.
Is this feasible? Would it be a good idea?
I still don't see why you can't just block the thread on an MVar (removing it from the main run queues), and then when you want to resume it write to the MVar. It'll have an added bonus that you'll automatically handle masking/async exceptions correctly.
If you find the MVar implementation is too slow, then maybe you can think about making an optimized implementation which doesn't use any synchronization / is inline in the TSO so no allocation is necessary. But in my opinion this is putting the cart before the horse.
Yes, that seems simple and fast enough. You’re right that we should measure before making assumptions about hypothetical poor performance. Thank you for your help
Edward
Greetings, Nicola

On Mon, Nov 9, 2015 at 6:02 AM, Nicola Gigante
Nothing actually, I was confused about what “unsafe” call means.
In fact, as you've probably realized, the reason these calls are labeled
unsafe is precisely because they don't yield the capability. An unsafe FFI
call that blocks will block the Haskell capability -- this not only starves
that CPU for actual work, it will eventually block the whole program once
the RTS tries to do a round of stop-the-world GC.
As Edward pointed out, you're better off just relying on the concurrency
primitives Haskell already gives you unless you find they're too slow.
Another thing you can try is the unagi-chan library on Hackage (
https://hackage.haskell.org/package/unagi-chan), which offers versions of
blocking and non-blocking producer-consumer queues that claim to be much
faster than the ones that come with the standard library.
Greg
--
Gregory Collins
participants (3)
-
Edward Z. Yang
-
Gregory Collins
-
Nicola Gigante