Proposal: add forkOnIO and friends to Control.Concurrent:

newer
[commit: base] master: Generalize...

Simon Marlow

22 Dec 2010 22 Dec '10

11:41 a.m.

Ticket: http://hackage.haskell.org/trac/ghc/ticket/4859 I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API: {{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads. The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'. -} forkOnIO :: Int -> IO () -> IO ThreadId -- | Like 'forkIOWithUnmask', but the child thread is pinned to the -- given CPU, as with 'forkOnIO'. forkOnIOWithUnmask :: Int -> ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId {- | Returns the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time. The CPU number passed to `forkOnIO` is interpreted modulo this value. An implementation in which Haskell threads are mapped directly to OS threads might return the number of physical processor cores in the machine, and 'forkOnIO' would be implemented using the OS's affinity facilities. An implementation that schedules Haskell threads onto a smaller number of OS threads (like GHC) would return the number of such OS threads that can be running simultaneously. GHC notes: this returns the number passed as the argument to the @+RTS -N@ flag. In current implementations, the value is fixed when the program starts and never changes, but it is possible that in the future the number of capabilities might vary at runtime. -} getNumCapabilities :: IO Int -- | returns @Just x@ if the given thread was created with either -- @forkOnIO x@ or @forkOnIOWithUnmask@, or @Nothing@ otherwise. threadIsPinned :: ThreadId -> IO (Maybe Int) }}} In base 4.3.0.0 (GHC 7.0.1) we currently have `forkOnIO` and `forkOnIOUnmasked`, available from `GHC.Conc`. I am about to add the other functions to `GHC.Conc`, and deprecate `forkOnIOUnmasked` (see http://hackage.haskell.org/trac/ghc/ticket/4457). This proposal is to export the above functions from`Control.Concurrent` too. A feature request for `threadIsPinned` was submitted as http://hackage.haskell.org/trac/ghc/ticket/4457. Naming is up for grabs: I'm not at all sure that "capabilities" is a good word here, but I can't think of any better ideas. "processors" or "CPUs" don't seem quite right. Discussion period: 4 weeks (until 19 Jan 2011) Cheers, Simon

Show replies by date

Duncan Coutts

22 Dec 22 Dec

3:55 p.m.

On 22 December 2010 11:41, Simon Marlow wrote:

...

Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'.

While Int is clearly enough for a multi-processors single node system, I wonder if there is something slightly more general that would give us a common API with the Eden / distributed Haskell people. For example in MPI the identification of a node is rather more complex than a single integer. Perhaps some more abstract identifier for a capability would be useful. Closely related of course is how one discovers the available capabilities. Duncan

Simon Marlow

4:08 p.m.

On 22/12/2010 15:55, Duncan Coutts wrote:

...

On 22 December 2010 11:41, Simon Marlow wrote:

...
Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'.

While Int is clearly enough for a multi-processors single node system, I wonder if there is something slightly more general that would give us a common API with the Eden / distributed Haskell people. For example in MPI the identification of a node is rather more complex than a single integer. Perhaps some more abstract identifier for a capability would be useful. Closely related of course is how one discovers the available capabilities.

There aren't any systems that support remote forkIO, and it's not clear that forkIO is the right abstraction for a distributed system. See for example the design at http://hackage.haskell.org/trac/ghc/wiki/ErlangInHaskell which uses a different primitive for remote forks. I think we can safely consider forkIO (and forkOnIO) to be node-local forks, so we don't have to worry about a more complex node structure here. Cheers, Simon

Iavor Diatchki

10:03 p.m.

Hello, Just a minor comment about the types: it seems a lot more natural to use Word instead of Int to identify CPUs and to report the number of capabilities. I can't think of any meaningful use of negative numbers in these functions. -Iavor On Wed, Dec 22, 2010 at 8:08 AM, Simon Marlow wrote:

...

On 22/12/2010 15:55, Duncan Coutts wrote:

...
On 22 December 2010 11:41, Simon Marlow wrote:

...
Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'.

While Int is clearly enough for a multi-processors single node system, I wonder if there is something slightly more general that would give us a common API with the Eden / distributed Haskell people. For example in MPI the identification of a node is rather more complex than a single integer. Perhaps some more abstract identifier for a capability would be useful. Closely related of course is how one discovers the available capabilities.

There aren't any systems that support remote forkIO, and it's not clear that forkIO is the right abstraction for a distributed system. See for example the design at

http://hackage.haskell.org/trac/ghc/wiki/ErlangInHaskell

which uses a different primitive for remote forks.

I think we can safely consider forkIO (and forkOnIO) to be node-local forks, so we don't have to worry about a more complex node structure here.

Cheers, Simon

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Daniel Peebles

10:26 p.m.

Same with just about every other use of Int in the standard libraries and even GHC.Prim :) On Wed, Dec 22, 2010 at 5:03 PM, Iavor Diatchki wrote:

...

Hello, Just a minor comment about the types: it seems a lot more natural to use Word instead of Int to identify CPUs and to report the number of capabilities. I can't think of any meaningful use of negative numbers in these functions. -Iavor

...
On 22/12/2010 15:55, Duncan Coutts wrote:

...
On 22 December 2010 11:41, Simon Marlow wrote:

...
Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'.

While Int is clearly enough for a multi-processors single node system, I wonder if there is something slightly more general that would give us a common API with the Eden / distributed Haskell people. For example in MPI the identification of a node is rather more complex than a single integer. Perhaps some more abstract identifier for a capability would be useful. Closely related of course is how one discovers the available capabilities.

There aren't any systems that support remote forkIO, and it's not clear

On Wed, Dec 22, 2010 at 8:08 AM, Simon Marlow wrote: that

...
forkIO is the right abstraction for a distributed system. See for example the design at

http://hackage.haskell.org/trac/ghc/wiki/ErlangInHaskell

which uses a different primitive for remote forks.

I think we can safely consider forkIO (and forkOnIO) to be node-local forks, so we don't have to worry about a more complex node structure here.

Cheers, Simon

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Bas van Dijk

23 Dec 23 Dec

12:12 a.m.

On Wed, Dec 22, 2010 at 4:55 PM, Duncan Coutts wrote:

...

On 22 December 2010 11:41, Simon Marlow wrote:

...
Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'.

While Int is clearly enough for a multi-processors single node system, I wonder if there is something slightly more general that would give us a common API with the Eden / distributed Haskell people. For example in MPI the identification of a node is rather more complex than a single integer. Perhaps some more abstract identifier for a capability would be useful. Closely related of course is how one discovers the available capabilities.

Such a design would probably look something like this: newtype Capability = Capability Int -- abstract getCapabilities :: IO [Capability] -- or should this be: getCapabilities :: IO (Capability, [Capability]) -- to make it clear there's always at least one capability. forkOnIO :: Capability -> IO () -> IO ThreadId forkOnIOWithUnmask :: Capability -> ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId threadIsPinned :: ThreadId -> IO (Maybe Capability) I don't really like the name "Capability". "Processing Unit" is more descriptive but a bit long. Maybe shorten that to "PU". However I'm not sure yet if such a design provides much benefits over the current one. Bas

Gábor Lehel

12:22 a.m.

On Wed, Dec 22, 2010 at 12:41 PM, Simon Marlow wrote:

...

Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'. -} forkOnIO :: Int -> IO () -> IO ThreadId

-- | Like 'forkIOWithUnmask', but the child thread is pinned to the -- given CPU, as with 'forkOnIO'. forkOnIOWithUnmask :: Int -> ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId

{- | Returns the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time. The CPU number passed to `forkOnIO` is interpreted modulo this value.

An implementation in which Haskell threads are mapped directly to OS threads might return the number of physical processor cores in the machine, and 'forkOnIO' would be implemented using the OS's affinity facilities. An implementation that schedules Haskell threads onto a smaller number of OS threads (like GHC) would return the number of such OS threads that can be running simultaneously.

GHC notes: this returns the number passed as the argument to the @+RTS -N@ flag. In current implementations, the value is fixed when the program starts and never changes, but it is possible that in the future the number of capabilities might vary at runtime. -} getNumCapabilities :: IO Int

-- | returns @Just x@ if the given thread was created with either -- @forkOnIO x@ or @forkOnIOWithUnmask@, or @Nothing@ otherwise. threadIsPinned :: ThreadId -> IO (Maybe Int) }}}

In base 4.3.0.0 (GHC 7.0.1) we currently have `forkOnIO` and `forkOnIOUnmasked`, available from `GHC.Conc`. I am about to add the other functions to `GHC.Conc`, and deprecate `forkOnIOUnmasked` (see http://hackage.haskell.org/trac/ghc/ticket/4457). This proposal is to export the above functions from`Control.Concurrent` too.

A feature request for `threadIsPinned` was submitted as http://hackage.haskell.org/trac/ghc/ticket/4457.

Naming is up for grabs: I'm not at all sure that "capabilities" is a good word here, but I can't think of any better ideas. "processors" or "CPUs" don't seem quite right.

What about, er, "Core"?

...

Discussion period: 4 weeks (until 19 Jan 2011)

Cheers, Simon

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

-- Work is punishment for failing to procrastinate effectively.

Thomas Schilling

24 Dec 24 Dec

1:57 a.m.

I agree with that a more descriptive type than Int would be better (even if it's just a newtype). I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork). Other than that, I agree that the implementation seems likely portable (and can easily be made to work in single-threaded systems). On 22 December 2010 11:41, Simon Marlow wrote:

...

Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'. -} forkOnIO :: Int -> IO () -> IO ThreadId

-- | Like 'forkIOWithUnmask', but the child thread is pinned to the -- given CPU, as with 'forkOnIO'. forkOnIOWithUnmask :: Int -> ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId

{- | Returns the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time. The CPU number passed to `forkOnIO` is interpreted modulo this value.

An implementation in which Haskell threads are mapped directly to OS threads might return the number of physical processor cores in the machine, and 'forkOnIO' would be implemented using the OS's affinity facilities. An implementation that schedules Haskell threads onto a smaller number of OS threads (like GHC) would return the number of such OS threads that can be running simultaneously.

GHC notes: this returns the number passed as the argument to the @+RTS -N@ flag. In current implementations, the value is fixed when the program starts and never changes, but it is possible that in the future the number of capabilities might vary at runtime. -} getNumCapabilities :: IO Int

-- | returns @Just x@ if the given thread was created with either -- @forkOnIO x@ or @forkOnIOWithUnmask@, or @Nothing@ otherwise. threadIsPinned :: ThreadId -> IO (Maybe Int) }}}

In base 4.3.0.0 (GHC 7.0.1) we currently have `forkOnIO` and `forkOnIOUnmasked`, available from `GHC.Conc`. I am about to add the other functions to `GHC.Conc`, and deprecate `forkOnIOUnmasked` (see http://hackage.haskell.org/trac/ghc/ticket/4457). This proposal is to export the above functions from`Control.Concurrent` too.

A feature request for `threadIsPinned` was submitted as http://hackage.haskell.org/trac/ghc/ticket/4457.

Naming is up for grabs: I'm not at all sure that "capabilities" is a good word here, but I can't think of any better ideas. "processors" or "CPUs" don't seem quite right.

Discussion period: 4 weeks (until 19 Jan 2011)

Cheers, Simon

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

-- Push the envelope. Watch it bend.

Brandon S Allbery KF8NH

1 Jan 1 Jan

6:59 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/23/10 20:57 , Thomas Schilling wrote:

...

I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

+1 The name "forkIO" always seemed a bit odd to me; aren't threads pretty much constrained to IO anyway? - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0feW4ACgkQIn7hlCsL25VwlwCgguYST7bQcgfCQyvFBZUGuccw q+4An2xxGURPI7rMjHiuejEqfS/pc7Q5 =egAU -----END PGP SIGNATURE-----

Maciej Piechotka

2 Jan 2 Jan

1:21 a.m.

On Sat, 2011-01-01 at 13:59 -0500, Brandon S Allbery KF8NH wrote:

...

On 12/23/10 20:57 , Thomas Schilling wrote:

...
I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

+1 The name "forkIO" always seemed a bit odd to me; aren't threads pretty much constrained to IO anyway?

Depends on what you understand by thread:

...

import Control.DeepSeq import Control.Parallel import Control.Seq

mergeSort :: (NFData a, Ord a) => [a] -> [a] mergeSort [] = [] mergeSort [x] = [x] mergeSort xs = let (ys, zs) = foldr (\e (a, b) -> (e:b, a)) ([], []) xs merge xs [] = xs merge [] ys = ys merge s@(x:xs) t@(y:ys) | x < y = x : merge xs t | otherwise = y : merge s ys ys' = mergeSort ys `using` rdeepseq zs' = mergeSort zs `using` rdeepseq in ys' `par` zs' `par` merge ys' zs'

main = print $ mergeSort ([1..100] :: [Int])

(Code was not written to be nice, efficient or even correct but to illustrate the point.) Regards

Brandon S Allbery KF8NH

1:30 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/1/11 20:21 , Maciej Piechotka wrote:

...

On Sat, 2011-01-01 at 13:59 -0500, Brandon S Allbery KF8NH wrote:

...
On 12/23/10 20:57 , Thomas Schilling wrote:

...
I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

+1 The name "forkIO" always seemed a bit odd to me; aren't threads pretty much constrained to IO anyway?

Depends on what you understand by thread:

Implicit threading in the runtime doesn't actually count, no; if that isn't referentially transparent then by definition it is broken. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0f1T0ACgkQIn7hlCsL25UhNQCfXLH8xJp20yA0YAxOCHmJ67Yw aMUAmwXm6NVdDT1Gme8Tpjb0LQy7qyxI =nkmG -----END PGP SIGNATURE-----

Isaac Dupree

7:16 a.m.

On 01/01/11 20:30, Brandon S Allbery KF8NH wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 1/1/11 20:21 , Maciej Piechotka wrote:

...
On Sat, 2011-01-01 at 13:59 -0500, Brandon S Allbery KF8NH wrote:

...
On 12/23/10 20:57 , Thomas Schilling wrote:

...
I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

+1 The name "forkIO" always seemed a bit odd to me; aren't threads pretty much constrained to IO anyway?

Depends on what you understand by thread:

Implicit threading in the runtime doesn't actually count, no; if that isn't referentially transparent then by definition it is broken.

Kernel threads are an implementation detail, unless you need one for Thread-Local State or the like (using the poorly-named forkOS). GHC threads are created by both forkIO and (usually) `par`. `par` is referentially transparent because you can't observe GHC threads outside of IO, not because GHC threads aren't created by `par`. (The "usually" is because the compiler and runtime is free to make a thread or not, since it *is* referentially transparent regardless of this choice.) For a programmer, it's useful to understand the difference between concurrency and parallelism. But for an amateur understanding of the implementation, both programming-concepts use the same sort of GHC-threads. I like the name "forkIO" because it says pretty much what it does: it creates ("forks") a second running IO. It doesn't fork a process (Unix fork()), and it doesn't necessarily fork a kernel thread. It does fork a Haskell thread (although it is not the only basic function that forks Haskell threads). (Correct me if I'm wrong about the implementation.) IMHO whether "threads [are] pretty much constrained to IO" does "depend on what you understand by thread". I don't mind either name "forkIO" or "forkThread". -Isaac

Mitar

3:31 p.m.

Hi! On Sun, Jan 2, 2011 at 8:16 AM, Isaac Dupree wrote:

...

I don't mind either name "forkIO" or "forkThread".

I am not sure if it is really important how we name things. Or if it is so important that it would justify breaking backwards compliance (and having both new and old names is really a path to chaos where (at least) beginners will be confused by old documentation and discrepancy). But, to join you in this bikesheed color, I propose "igniteSpark". ;-) Mitar

Felipe Almeida Lessa

3:48 p.m.

On Sun, Jan 2, 2011 at 1:31 PM, Mitar wrote:

...

(and having both new and old names is really a path to chaos where (at least) beginners will be confused by old documentation and discrepancy).

We just need to {-# DEPRECATED forkIO "Use fooBar instead #-} and point out that the name changed in the docs. GHC will then warn everyone using the old name that it is deprecated and that the new name is fooBar. Where is the chaos? Cheers! -- Felipe.

Brandon S Allbery KF8NH

5:17 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/2/11 02:16 , Isaac Dupree wrote:

...

On 01/01/11 20:30, Brandon S Allbery KF8NH wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 1/1/11 20:21 , Maciej Piechotka wrote:

...
On Sat, 2011-01-01 at 13:59 -0500, Brandon S Allbery KF8NH wrote:

...
On 12/23/10 20:57 , Thomas Schilling wrote:

...
I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

+1 The name "forkIO" always seemed a bit odd to me; aren't threads pretty much constrained to IO anyway?

Depends on what you understand by thread:

Implicit threading in the runtime doesn't actually count, no; if that isn't referentially transparent then by definition it is broken.

Kernel threads are an implementation detail, unless you need one for Thread-Local State or the like (using the poorly-named forkOS).

GHC threads are created by both forkIO and (usually) `par`. `par` is referentially transparent because you can't observe GHC threads outside of IO, not because GHC threads aren't created by `par`. (The "usually" is because the compiler and runtime is free to make a thread or not, since it *is* referentially transparent regardless of this choice.)

Exactly. I look at this as: `par` is a hint, the implementation might or might not thread. Maybe I need to rephrase my point: *explicit* threading, almost by definition, must be in IO. *Implicit* threading is an implementation detail; `par` might or might not create a thread, since it's a hint, and (at least in theory if not in actual practice as yet) the runtime could autothread things without a hint if it can determine that there would be a benefit. In any case, all of this is kind of beside the point: "forkIO" implies a "forkNotIO" of some kind (explicit pure threads) which doesn't exist and probably can't exist usefully. `par` is more of an optimization hint, in my mind, directed at an optional implicit threader that is free to ignore the hint and even potentially fire on its own in the absence of `par`. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0gsygACgkQIn7hlCsL25WN4ACggqDky1PteVc/8QFKVD4SPTSB aBkAnRDPhqzMIp34pdr5CrQeueNlP0bg =KWMK -----END PGP SIGNATURE-----

Maciej Piechotka

1:35 a.m.

On Sat, 2011-01-01 at 13:59 -0500, Brandon S Allbery KF8NH wrote:

...

On 12/23/10 20:57 , Thomas Schilling wrote:

...
I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

+1 The name "forkIO" always seemed a bit odd to me; aren't threads pretty much constrained to IO anyway?

Depends on what you understand by thread:

...

import Control.DeepSeq import Control.Parallel import Control.Seq

mergeSort :: (NFData a, Ord a) => [a] -> [a] mergeSort [] = [] mergeSort [x] = [x] mergeSort xs = let (ys, zs) = foldr (\e (a, b) -> (e:b, a)) ([], []) xs merge xs [] = xs merge [] ys = ys merge s@(x:xs) t@(y:ys) | x < y = x : merge xs t | otherwise = y : merge s ys ys' = mergeSort ys `using` rdeepseq zs' = mergeSort zs `using` rdeepseq in ys' `par` zs' `par` merge ys' zs'

main = print $ mergeSort ([1..100] :: [Int])

(Code was not written to be nice, efficient or even correct but to illustrate the point.) Regards

Simon Marlow

5 Jan 5 Jan

10:51 a.m.

On 24/12/2010 01:57, Thomas Schilling wrote:

...

I agree with that a more descriptive type than Int would be better (even if it's just a newtype).

The Int argument has three purposes: - you can ask that two threads be placed on the same processor (to optimise for communication overhead) - you can ask that two threads be placed on different processors, if enough are available (to optimise for parallelism) - values are independent of the number of processors actually available The third point is important, because the number of actual processors might change over time, so it's important that the scheduler knows what to do with pinned threads when it adds or removes processors. This is why something like getCapabilities :: IO [Capability] isn't good - it might give you different results depending on when you call it, and you have to make fixed scheduling decisions based on the number of processors at the current time. if the Int were instead a newtype, it would at least need to be an instance of Enum, and I'm not sure there's any advantage in that. Can you see any?

...

I also find the name 'forkOnIO' extremely confusing. Without reading the docs it seems to imply that a thread is created on IO, i.e., if I/O happens. This makes no sense, and is of course not what's happening. However, I assume you chose it because forkIOOn looks a bit weird. In that case, why not use forkThreadOn, and in a separate proposal change forkIO to forkThread (or just fork).

How about just forkOn? I agree that we should rename forkIO too, but let's leave that for another proposal. Cheers, Simon

...

Other than that, I agree that the implementation seems likely portable (and can easily be made to work in single-threaded systems).

On 22 December 2010 11:41, Simon Marlow wrote:

...
Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

I think these functions are implementation-independent enough to add to the main `Control.Concurrent` API:

{{{ {- | Like 'forkIO', but lets you specify on which CPU the thread is created. Unlike a `forkIO` thread, a thread created by `forkOnIO` will stay on the same CPU for its entire lifetime (`forkIO` threads can migrate between CPUs according to the scheduling policy). `forkOnIO` is useful for overriding the scheduling policy when you know in advance how best to distribute the threads.

The `Int` argument specifies the CPU number; it is interpreted modulo the value returned by 'getNumCapabilities'. -} forkOnIO :: Int -> IO () -> IO ThreadId

-- | Like 'forkIOWithUnmask', but the child thread is pinned to the -- given CPU, as with 'forkOnIO'. forkOnIOWithUnmask :: Int -> ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId

{- | Returns the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time. The CPU number passed to `forkOnIO` is interpreted modulo this value.

An implementation in which Haskell threads are mapped directly to OS threads might return the number of physical processor cores in the machine, and 'forkOnIO' would be implemented using the OS's affinity facilities. An implementation that schedules Haskell threads onto a smaller number of OS threads (like GHC) would return the number of such OS threads that can be running simultaneously.

GHC notes: this returns the number passed as the argument to the @+RTS -N@ flag. In current implementations, the value is fixed when the program starts and never changes, but it is possible that in the future the number of capabilities might vary at runtime. -} getNumCapabilities :: IO Int

-- | returns @Just x@ if the given thread was created with either -- @forkOnIO x@ or @forkOnIOWithUnmask@, or @Nothing@ otherwise. threadIsPinned :: ThreadId -> IO (Maybe Int) }}}

In base 4.3.0.0 (GHC 7.0.1) we currently have `forkOnIO` and `forkOnIOUnmasked`, available from `GHC.Conc`. I am about to add the other functions to `GHC.Conc`, and deprecate `forkOnIOUnmasked` (see http://hackage.haskell.org/trac/ghc/ticket/4457). This proposal is to export the above functions from`Control.Concurrent` too.

A feature request for `threadIsPinned` was submitted as http://hackage.haskell.org/trac/ghc/ticket/4457.

Naming is up for grabs: I'm not at all sure that "capabilities" is a good word here, but I can't think of any better ideas. "processors" or "CPUs" don't seem quite right.

Discussion period: 4 weeks (until 19 Jan 2011)

Cheers, Simon

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Brandon S Allbery KF8NH

1 Jan 1 Jan

6:35 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/22/10 06:41 , Simon Marlow wrote:

...

Naming is up for grabs: I'm not at all sure that "capabilities" is a good word here, but I can't think of any better ideas. "processors" or "CPUs" don't seem quite right.

Normally called kernel threads (as distinct from user threads), no? Thread multiplexing is far from unusual. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0fc8sACgkQIn7hlCsL25VGBwCgtYYHd0Htff2qyTB5PQcnAFpM V9oAn2E1AL8WW25NgNcuqL9EnjOgTcyy =7bDy -----END PGP SIGNATURE-----

Simon Marlow

30 Mar 30 Mar

11:50 a.m.

On 22/12/2010 11:41, Simon Marlow wrote:

...

Ticket:

http://hackage.haskell.org/trac/ghc/ticket/4859

FYI, I just pushed the following: Wed Mar 30 03:05:04 PDT 2011 Simon Marlow * Export the affinity API from Control.Concurrent: forkOn and friends. forkOn :: Int -> IO () -> IO ThreadId forkOnWithUnmask :: Int -> ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId getNumCapabilities :: IO Int threadCapability :: ThreadId -> IO (Int, Bool) Following discussion on the libraries list, I renamed forkOnIO to forkOn. In due course we might want to also rename forkIO to fork. I left the Int argument as it is, it's quite useful to be able to specify a number to be interpreted modulo the actual number of processors. I also used the term "capability" consistently. It might not be the best choice, but we have to pick something. Cheers, Simon

5207

Age (days ago)

5305

Last active (days ago)

List overview

Download

18 comments

12 participants

participants (12)

Bas van Dijk
Brandon S Allbery KF8NH
Daniel Peebles
Duncan Coutts
Felipe Almeida Lessa
Gábor Lehel
Iavor Diatchki
Isaac Dupree
Maciej Piechotka
Mitar
Simon Marlow
Thomas Schilling