mapM_ for bytestring

Artyom Kazak

1 Sep 2013 1 Sep '13

10:58 a.m.

Would this be an appropriate place to propose adding mapM_ (and then possibly mapM) to bytestring library? Was it suggested before? If yes, why was it rejected?

Show replies by date

Erik de Castro Lopo

1 Sep 1 Sep

11:18 a.m.

Artyom Kazak wrote:

...

Would this be an appropriate place to propose adding mapM_ (and then possibly mapM) to bytestring library?

Err, mapM is defined as: mapM :: Monad m => (a -> m b) -> [a] -> m [b] in that the second parameter is a list. ByteStrings are not lists, they're chunks of bytes. Besides, you can just do: let bs = "This is a bytestring" mapM someFUnctions $ BS.unpack bs or define a function: mapBSM :: Monad m => (Char -> m a) -> ByteString -> m [a] mapBSM f bs = mapM f $ BS.unpack bs and similar for mapBSM_. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/

Artyom Kazak

11:31 a.m.

On Sun, 01 Sep 2013 15:18:32 +0400, Erik de Castro Lopo wrote:

...

Err, mapM is defined as:

mapM :: Monad m => (a -> m b) -> [a] -> m [b]

Yeah, the signatures for mapM_ and mapM would respectively be mapM_ :: Monad m => (Word8 -> m b) -> ByteString -> m () mapM :: Monad m => (Word8 -> m Word8) -> ByteString -> m ByteString After all, the signature for ByteString-y `map` too requires a function of type `Word8 -> Word8`, so it seems perfectly reasonable for mapM to exist.

...

in that the second parameter is a list. ByteStrings are not lists, they're chunks of bytes. Besides, you can just do:

let bs = "This is a bytestring" mapM someFUnctions $ BS.unpack bs

or define a function:

mapBSM :: Monad m => (Char -> m a) -> ByteString -> m [a] mapBSM f bs = mapM f $ BS.unpack bs

and similar for mapBSM_.

There’s even a RULE which would optimise this. However, it still would be *four times slower* than necessary if the monad in question is IO or ST.

Henning Thielemann

1:16 p.m.

On Sun, 1 Sep 2013, Artyom Kazak wrote:

...

On Sun, 01 Sep 2013 15:18:32 +0400, Erik de Castro Lopo wrote:

...
Err, mapM is defined as:

mapM :: Monad m => (a -> m b) -> [a] -> m [b]

Yeah, the signatures for mapM_ and mapM would respectively be

mapM_ :: Monad m => (Word8 -> m b) -> ByteString -> m ()

This one should not be a big problem, but could be done by calling mapM_ on the result of ByteString.unpack.

...

mapM :: Monad m => (Word8 -> m Word8) -> ByteString -> m ByteString

This one can become very inefficient. Think of m = [], then many ByteStrings have to be constructed by repreated ByteString.cons which requires a lot of copying. For lists "cons" aka (:) is efficient because the tails are shared.

Shachaf Ben-Kiki

4:44 p.m.

On Sun, Sep 1, 2013 at 6:16 AM, Henning Thielemann wrote:

...

...
mapM :: Monad m => (Word8 -> m Word8) -> ByteString -> m ByteString

This one can become very inefficient. Think of m = [], then many ByteStrings have to be constructed by repreated ByteString.cons which requires a lot of copying. For lists "cons" aka (:) is efficient because the tails are shared.

You don't need to cons -- you can allocate one ByteString and write into it unsafely (just like map does). See http://hackage.haskell.org/packages/archive/lens/3.9.0.3/doc/html/src/Contro.... Shachaf

Artyom Kazak

6:45 p.m.

On Sun, 01 Sep 2013 17:16:49 +0400, Henning Thielemann wrote:

...

...
Yeah, the signatures for mapM_ and mapM would respectively be

mapM_ :: Monad m => (Word8 -> m b) -> ByteString -> m ()

This one should not be a big problem, but could be done by calling mapM_ on the result of ByteString.unpack.

As I’ve said, mapM_ through ByteString.unpack is four times slower than the hand-written version. I find it unacceptable that a simple counting sort *can’t* be written efficiently for ByteString without importing Data.ByteString.Unsafe, Foreign.Ptr and friends.

Henning Thielemann

6:55 p.m.

On Sun, 1 Sep 2013, Artyom Kazak wrote:

...

On Sun, 01 Sep 2013 17:16:49 +0400, Henning Thielemann wrote:

...
...
Yeah, the signatures for mapM_ and mapM would respectively be

mapM_ :: Monad m => (Word8 -> m b) -> ByteString -> m ()

This one should not be a big problem, but could be done by calling mapM_ on the result of ByteString.unpack.

As I’ve said, mapM_ through ByteString.unpack is four times slower than the hand-written version. I find it unacceptable that a simple counting sort *can’t* be written efficiently for ByteString without importing Data.ByteString.Unsafe, Foreign.Ptr and friends.

A possible solution might be fusion rules for ByteString.unpack and mapM_.

Artyom Kazak

7:09 p.m.

On Sun, 01 Sep 2013 22:55:10 +0400, Henning Thielemann wrote:

...

...
As I’ve said, mapM_ through ByteString.unpack is four times slower than the hand-written version. I find it unacceptable that a simple counting sort *can’t* be written efficiently for ByteString without importing Data.ByteString.Unsafe, Foreign.Ptr and friends.

A possible solution might be fusion rules for ByteString.unpack and mapM_.

Except that such rules would require a hand-written version of mapM_ anyway. I agree that it can be written, it’s just that I don’t see why it should be in some obscure place instead of Data.ByteString.

Henning Thielemann

7:27 p.m.

On Sun, 1 Sep 2013, Artyom Kazak wrote:

...

On Sun, 01 Sep 2013 22:55:10 +0400, Henning Thielemann wrote:

...
A possible solution might be fusion rules for ByteString.unpack and mapM_.

Except that such rules would require a hand-written version of mapM_ anyway.

I agree that it can be written, it’s just that I don’t see why it should be in some obscure place instead of Data.ByteString.

This rule should of course be part of Data.ByteString. RULES are similar to class instances and like orphan instances, orphan rules are a bad idea.

Artyom Kazak

7:42 p.m.

On Sun, 01 Sep 2013 23:27:20 +0400, Henning Thielemann wrote:

...

On Sun, 1 Sep 2013, Artyom Kazak wrote:

...
On Sun, 01 Sep 2013 22:55:10 +0400, Henning Thielemann wrote:

...
A possible solution might be fusion rules for ByteString.unpack and mapM_.

Except that such rules would require a hand-written version of mapM_ anyway.

I agree that it can be written, it’s just that I don’t see why it should be in some obscure place instead of Data.ByteString.

This rule should of course be part of Data.ByteString. RULES are similar to class instances and like orphan instances, orphan rules are a bad idea.

It still doesn’t solve the problem quite like adding mapM_ does. Rules aren’t documented anywhere; why would the programmer expect `mapM_ . unpack` to fuse? Moreover, it violates the existing structure of bytestring package. For instance, functions like `last` and `maximum` could be implemented as rules in the same way, but they’re included in Data.ByteString (for good, I’d say).

Carter Schonwald

9:05 p.m.

a simpler subproblem is theres no Bystring.*.Unsafe.Internal module that lets Bytestring library clients to (at their own risk) add new fusion rules. Isn't this "private fusion framework" shenanigans where lib clients can't add new things that fuse well a recurrent problem making it difficult to have down stream users have their own performant extensions? On Sun, Sep 1, 2013 at 3:42 PM, Artyom Kazak wrote:

...

On Sun, 01 Sep 2013 23:27:20 +0400, Henning Thielemann < lemming@henning-thielemann.de**> wrote:

...
On Sun, 1 Sep 2013, Artyom Kazak wrote:

On Sun, 01 Sep 2013 22:55:10 +0400, Henning Thielemann <

...
lemming@henning-thielemann.de**> wrote:

A possible solution might be fusion rules for ByteString.unpack and

...
mapM_.

Except that such rules would require a hand-written version of mapM_ anyway.

I agree that it can be written, it’s just that I don’t see why it should be in some obscure place instead of Data.ByteString.

This rule should of course be part of Data.ByteString. RULES are similar to class instances and like orphan instances, orphan rules are a bad idea.

It still doesn’t solve the problem quite like adding mapM_ does. Rules aren’t documented anywhere; why would the programmer expect `mapM_ . unpack` to fuse?

Moreover, it violates the existing structure of bytestring package. For instance, functions like `last` and `maximum` could be implemented as rules in the same way, but they’re included in Data.ByteString (for good, I’d say).

______________________________**_________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/**mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries

Shachaf Ben-Kiki

4:31 p.m.

On Sun, Sep 1, 2013 at 3:58 AM, Artyom Kazak wrote:

...

Would this be an appropriate place to propose adding mapM_ (and then possibly mapM) to bytestring library?

Was it suggested before? If yes, why was it rejected?

For what it's worth, this is exported by lens in Data.ByteString.Lens.bytes (or Data.ByteString.{Lazy,Strict}.Lens.bytes). The version exported by lens by default traverses each chunk in a balanced tree shape rather than left-to-right (which is probably not relevant to most use cases, actually -- the result is the same). It also works around http://ghc.haskell.org/trac/ghc/ticket/7556, which you'll want to watch for if you're writing this yourself. Note that you can construct mapM_ from mapM as discussed in another libraries@ thread recently (or by using e.g. mapMOf_). Shachaf

Edward Kmett

9:42 p.m.

Personally, I would support its inclusion. It is an annoying repetition -- but then so is the entire bytestring API -- so there is no real point in just randomly omitting this one combinator. -Edward On Sun, Sep 1, 2013 at 6:58 AM, Artyom Kazak wrote:

...

Would this be an appropriate place to propose adding mapM_ (and then possibly mapM) to bytestring library?

Was it suggested before? If yes, why was it rejected?

______________________________**_________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/**mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries

Artyom Kazak

5 Sep 5 Sep

8:38 p.m.

So, I have written several implementations of mapM_: * bsMapM_gen — generic, works for any monad * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) * bsMapM_IO — hand-written version specifically for IO Generic and hand-written versions don’t differ much. The overhead seems to be coming from inlinePerformIO (am I right here? Also, am I using inlinePerformIO legitimately?), which is only needed when we’re not in the IO monad. {-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp where mapp (ptr, len) = return $ go 0 where go i | i == len = return () | otherwise = let !b = inlinePerformIO $ peekByteOff ptr i in f b >> go (i+1) The short version relies on fusion of `unpack` and `mapM_`. Its advantage is that even when compiled without optimisations, it’s still fast. (Question: would the same happen to other versions, when put into Data.ByteString module? I suppose packages like bytestring are compiled with optimisations, so it probably would.) {-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_shortIO f s = mapM_ f (unpack s) Finally, the IO-specialised version. It’s faster than generic version (and, similarly, an ST-specialised version using unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma involving bsMapM_IO and bsMapM_ST should be present. (Question: are there other monads for which unsafeIOToMonad exists?) bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_IO f s = unsafeUseAsCStringLen s mapp where mapp (ptr, len) = go 0 where go i | i == len = return () | otherwise = peekByteOff ptr i >>= f >> go (i+1) A-and here’s a table comparing performance of all three functions. All timings are in milliseconds. ghci ghc ghc -O ghc -O2 +----------+----------+----------+----------+ gen | 380 | 85 | 4.1 | 4.0 | short | 45 | 46 | 17.2 | 16.5 | IO | 434 | 92 | 2.4 | 2.4 | +----------+----------+----------+----------+ Here’s the code I used. (Question: have I messed up anything?) import qualified Data.ByteString as BS import Data.Random import System.Random import System.IO.Unsafe import Control.Monad import Data.IORef import Criterion.Main import BSMaps --a bytestring consisting of 65536 random bytes testCase = BS.pack $ fst $ flip sampleState (mkStdGen 8) $ replicateM (2^16) stdUniform --sums elements of a bytestring, using given mapM_ sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> BS.ByteString -> Word8 sumIO f s = unsafePerformIO $ do sm <- newIORef 0 f (modifyIORef' sm . (+)) s readIORef sm --runs the tests main = defaultMain [ bench "IO" $ whnf (sumIO bsMapM_IO) testCase, bench "short" $ whnf (sumIO bsMapM_short) testCase, bench "gen" $ whnf (sumIO bsMapM_gen) testCase] Finally, if there isn’t anything wrong, what are my next steps to see this included into next version of bytestring?

Edward Kmett

10:51 p.m.

The maintainers for bytestring are still listed as Don Stewart and Duncan Coutts on the package, and it doesn't seem to fall the list of core packages per http://www.haskell.org/haskellwiki/Library_submissions so I suppose it would come down to talking one of them into taking the patch. It seems odd that a fundamental package like this is omitted from the Library_submissions page though, as the older http://trac.haskell.org/haskell-platform/wiki/PackageMaintainers page on the trac shows it as maintained by GHC Central. -Edward On Thu, Sep 5, 2013 at 4:38 PM, Artyom Kazak wrote:

...

So, I have written several implementations of mapM_: * bsMapM_gen — generic, works for any monad * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) * bsMapM_IO — hand-written version specifically for IO

Generic and hand-written versions don’t differ much. The overhead seems to be coming from inlinePerformIO (am I right here? Also, am I using inlinePerformIO legitimately?), which is only needed when we’re not in the IO monad.

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp where mapp (ptr, len) = return $ go 0 where go i | i == len = return () | otherwise = let !b = inlinePerformIO $ peekByteOff ptr i in f b >> go (i+1)

The short version relies on fusion of `unpack` and `mapM_`. Its advantage is that even when compiled without optimisations, it’s still fast. (Question: would the same happen to other versions, when put into Data.ByteString module? I suppose packages like bytestring are compiled with optimisations, so it probably would.)

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_shortIO f s = mapM_ f (unpack s)

Finally, the IO-specialised version. It’s faster than generic version (and, similarly, an ST-specialised version using unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma involving bsMapM_IO and bsMapM_ST should be present. (Question: are there other monads for which unsafeIOToMonad exists?)

bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_IO f s = unsafeUseAsCStringLen s mapp where mapp (ptr, len) = go 0 where go i | i == len = return () | otherwise = peekByteOff ptr i >>= f >> go (i+1)

A-and here’s a table comparing performance of all three functions. All timings are in milliseconds.

ghci ghc ghc -O ghc -O2 +----------+----------+-------**---+----------+ gen | 380 | 85 | 4.1 | 4.0 | short | 45 | 46 | 17.2 | 16.5 | IO | 434 | 92 | 2.4 | 2.4 | +----------+----------+-------**---+----------+

Here’s the code I used. (Question: have I messed up anything?)

import qualified Data.ByteString as BS import Data.Random import System.Random import System.IO.Unsafe import Control.Monad import Data.IORef import Criterion.Main import BSMaps

--a bytestring consisting of 65536 random bytes testCase = BS.pack $ fst $ flip sampleState (mkStdGen 8) $ replicateM (2^16) stdUniform

--sums elements of a bytestring, using given mapM_ sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> BS.ByteString -> Word8 sumIO f s = unsafePerformIO $ do sm <- newIORef 0 f (modifyIORef' sm . (+)) s readIORef sm

--runs the tests main = defaultMain [ bench "IO" $ whnf (sumIO bsMapM_IO) testCase, bench "short" $ whnf (sumIO bsMapM_short) testCase, bench "gen" $ whnf (sumIO bsMapM_gen) testCase]

Finally, if there isn’t anything wrong, what are my next steps to see this included into next version of bytestring?

______________________________**_________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/**mailman/listinfo/libraries http://www.haskell.org/mailman/listinfo/libraries

Johan Tibell

11:07 p.m.

I think the important issue here is whether we want to add monadic versions of functions in bytestring (where it makes sense)? It wield yield better performance than using `unpack`, but at the cost of lots of code duplication. We have the same issue in containers where you could for example want an `updateM` function, that would let you decide whether to update a value by performing some side effect. You could simulate this using a combination of `lookup` and `insert`, but that requires twice the number of traversals of the data structure. Right now I'm of the mind that the extra traversals (and using unpack in the case of ByteString) is better than the code duplication. On Thu, Sep 5, 2013 at 3:51 PM, Edward Kmett wrote:

...

The maintainers for bytestring are still listed as Don Stewart and Duncan Coutts on the package, and it doesn't seem to fall the list of core packages per http://www.haskell.org/haskellwiki/Library_submissions so I suppose it would come down to talking one of them into taking the patch.

It seems odd that a fundamental package like this is omitted from the Library_submissions page though, as the older http://trac.haskell.org/haskell-platform/wiki/PackageMaintainers page on the trac shows it as maintained by GHC Central.

-Edward

On Thu, Sep 5, 2013 at 4:38 PM, Artyom Kazak wrote:

...
So, I have written several implementations of mapM_: * bsMapM_gen — generic, works for any monad * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) * bsMapM_IO — hand-written version specifically for IO

Generic and hand-written versions don’t differ much. The overhead seems to be coming from inlinePerformIO (am I right here? Also, am I using inlinePerformIO legitimately?), which is only needed when we’re not in the IO monad.

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp where mapp (ptr, len) = return $ go 0 where go i | i == len = return () | otherwise = let !b = inlinePerformIO $ peekByteOff ptr i in f b >> go (i+1)

The short version relies on fusion of `unpack` and `mapM_`. Its advantage is that even when compiled without optimisations, it’s still fast. (Question: would the same happen to other versions, when put into Data.ByteString module? I suppose packages like bytestring are compiled with optimisations, so it probably would.)

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_shortIO f s = mapM_ f (unpack s)

Finally, the IO-specialised version. It’s faster than generic version (and, similarly, an ST-specialised version using unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma involving bsMapM_IO and bsMapM_ST should be present. (Question: are there other monads for which unsafeIOToMonad exists?)

bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_IO f s = unsafeUseAsCStringLen s mapp where mapp (ptr, len) = go 0 where go i | i == len = return () | otherwise = peekByteOff ptr i >>= f >> go (i+1)

A-and here’s a table comparing performance of all three functions. All timings are in milliseconds.

ghci ghc ghc -O ghc -O2 +----------+----------+----------+----------+ gen | 380 | 85 | 4.1 | 4.0 | short | 45 | 46 | 17.2 | 16.5 | IO | 434 | 92 | 2.4 | 2.4 | +----------+----------+----------+----------+

Here’s the code I used. (Question: have I messed up anything?)

import qualified Data.ByteString as BS import Data.Random import System.Random import System.IO.Unsafe import Control.Monad import Data.IORef import Criterion.Main import BSMaps

--a bytestring consisting of 65536 random bytes testCase = BS.pack $ fst $ flip sampleState (mkStdGen 8) $ replicateM (2^16) stdUniform

--sums elements of a bytestring, using given mapM_ sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> BS.ByteString -> Word8 sumIO f s = unsafePerformIO $ do sm <- newIORef 0 f (modifyIORef' sm . (+)) s readIORef sm

--runs the tests main = defaultMain [ bench "IO" $ whnf (sumIO bsMapM_IO) testCase, bench "short" $ whnf (sumIO bsMapM_short) testCase, bench "gen" $ whnf (sumIO bsMapM_gen) testCase]

Finally, if there isn’t anything wrong, what are my next steps to see this included into next version of bytestring?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Edward Kmett

11:36 p.m.

I personally find myself on the other side of the divide, especially if someone else is willing to do the work to write and test the method for two reasons. 1.) The bytestring API is already fairly noisy in that it really needs to be imported qualified already. Finding such a common function *missing* from an API that is so all-inclusive is pretty jarring from a user perspective, so it would seem to follow the principle of least surprise to include it if possible. 2.) Bytestring was built from the ground up for speed. A factor of 4-8x speed difference even before dipping into inlinePerformIO tricks isn't negligible here. -Edward On Thu, Sep 5, 2013 at 7:07 PM, Johan Tibell wrote:

...

I think the important issue here is whether we want to add monadic versions of functions in bytestring (where it makes sense)? It wield yield better performance than using `unpack`, but at the cost of lots of code duplication.

We have the same issue in containers where you could for example want an `updateM` function, that would let you decide whether to update a value by performing some side effect. You could simulate this using a combination of `lookup` and `insert`, but that requires twice the number of traversals of the data structure.

Right now I'm of the mind that the extra traversals (and using unpack in the case of ByteString) is better than the code duplication.

...
The maintainers for bytestring are still listed as Don Stewart and Duncan Coutts on the package, and it doesn't seem to fall the list of core

On Thu, Sep 5, 2013 at 3:51 PM, Edward Kmett wrote: packages

...
per http://www.haskell.org/haskellwiki/Library_submissions so I suppose it would come down to talking one of them into taking the patch.

It seems odd that a fundamental package like this is omitted from the Library_submissions page though, as the older http://trac.haskell.org/haskell-platform/wiki/PackageMaintainers page on the trac shows it as maintained by GHC Central.

-Edward

On Thu, Sep 5, 2013 at 4:38 PM, Artyom Kazak wrote:

...
So, I have written several implementations of mapM_: * bsMapM_gen — generic, works for any monad * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) * bsMapM_IO — hand-written version specifically for IO

Generic and hand-written versions don’t differ much. The overhead seems to be coming from inlinePerformIO (am I right here? Also, am I using inlinePerformIO legitimately?), which is only needed when we’re not in the IO monad.

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp where mapp (ptr, len) = return $ go 0 where go i | i == len = return () | otherwise = let !b = inlinePerformIO $ peekByteOff ptr i in f b >> go (i+1)

The short version relies on fusion of `unpack` and `mapM_`. Its advantage is that even when compiled without optimisations, it’s still fast. (Question: would the same happen to other versions, when put into Data.ByteString module? I suppose packages like bytestring are compiled with optimisations, so it probably would.)

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_shortIO f s = mapM_ f (unpack s)

Finally, the IO-specialised version. It’s faster than generic version (and, similarly, an ST-specialised version using unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma involving bsMapM_IO and bsMapM_ST should be present. (Question: are there other monads for which unsafeIOToMonad exists?)

bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_IO f s = unsafeUseAsCStringLen s mapp where mapp (ptr, len) = go 0 where go i | i == len = return () | otherwise = peekByteOff ptr i >>= f >> go (i+1)

A-and here’s a table comparing performance of all three functions. All timings are in milliseconds.

ghci ghc ghc -O ghc -O2 +----------+----------+----------+----------+ gen | 380 | 85 | 4.1 | 4.0 | short | 45 | 46 | 17.2 | 16.5 | IO | 434 | 92 | 2.4 | 2.4 | +----------+----------+----------+----------+

Here’s the code I used. (Question: have I messed up anything?)

import qualified Data.ByteString as BS import Data.Random import System.Random import System.IO.Unsafe import Control.Monad import Data.IORef import Criterion.Main import BSMaps

--a bytestring consisting of 65536 random bytes testCase = BS.pack $ fst $ flip sampleState (mkStdGen 8) $ replicateM (2^16) stdUniform

--sums elements of a bytestring, using given mapM_ sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> BS.ByteString -> Word8 sumIO f s = unsafePerformIO $ do sm <- newIORef 0 f (modifyIORef' sm . (+)) s readIORef sm

--runs the tests main = defaultMain [ bench "IO" $ whnf (sumIO bsMapM_IO) testCase, bench "short" $ whnf (sumIO bsMapM_short) testCase, bench "gen" $ whnf (sumIO bsMapM_gen) testCase]

Finally, if there isn’t anything wrong, what are my next steps to see this included into next version of bytestring?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Johan Tibell

11:47 p.m.

I can see that argument. I did the opposite in unordered-containers, which have many near identical traversals in the name of speed. I must admit it's a pain to maintain. Changes typically have to happen in at least 2 places and the code feels cut-n-pasty. On Thu, Sep 5, 2013 at 4:36 PM, Edward Kmett wrote:

...

I personally find myself on the other side of the divide, especially if someone else is willing to do the work to write and test the method for two reasons.

1.) The bytestring API is already fairly noisy in that it really needs to be imported qualified already. Finding such a common function missing from an API that is so all-inclusive is pretty jarring from a user perspective, so it would seem to follow the principle of least surprise to include it if possible.

2.) Bytestring was built from the ground up for speed. A factor of 4-8x speed difference even before dipping into inlinePerformIO tricks isn't negligible here.

-Edward

On Thu, Sep 5, 2013 at 7:07 PM, Johan Tibell wrote:

...
I think the important issue here is whether we want to add monadic versions of functions in bytestring (where it makes sense)? It wield yield better performance than using `unpack`, but at the cost of lots of code duplication.

We have the same issue in containers where you could for example want an `updateM` function, that would let you decide whether to update a value by performing some side effect. You could simulate this using a combination of `lookup` and `insert`, but that requires twice the number of traversals of the data structure.

Right now I'm of the mind that the extra traversals (and using unpack in the case of ByteString) is better than the code duplication.

On Thu, Sep 5, 2013 at 3:51 PM, Edward Kmett wrote:

...
The maintainers for bytestring are still listed as Don Stewart and Duncan Coutts on the package, and it doesn't seem to fall the list of core packages per http://www.haskell.org/haskellwiki/Library_submissions so I suppose it would come down to talking one of them into taking the patch.

It seems odd that a fundamental package like this is omitted from the Library_submissions page though, as the older http://trac.haskell.org/haskell-platform/wiki/PackageMaintainers page on the trac shows it as maintained by GHC Central.

-Edward

On Thu, Sep 5, 2013 at 4:38 PM, Artyom Kazak wrote:

...
So, I have written several implementations of mapM_: * bsMapM_gen — generic, works for any monad * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) * bsMapM_IO — hand-written version specifically for IO

Generic and hand-written versions don’t differ much. The overhead seems to be coming from inlinePerformIO (am I right here? Also, am I using inlinePerformIO legitimately?), which is only needed when we’re not in the IO monad.

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp where mapp (ptr, len) = return $ go 0 where go i | i == len = return () | otherwise = let !b = inlinePerformIO $ peekByteOff ptr i in f b >> go (i+1)

The short version relies on fusion of `unpack` and `mapM_`. Its advantage is that even when compiled without optimisations, it’s still fast. (Question: would the same happen to other versions, when put into Data.ByteString module? I suppose packages like bytestring are compiled with optimisations, so it probably would.)

{-# SPECIALISE ... IO #-} {-# SPECIALISE ... ST #-} bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_shortIO f s = mapM_ f (unpack s)

Finally, the IO-specialised version. It’s faster than generic version (and, similarly, an ST-specialised version using unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma involving bsMapM_IO and bsMapM_ST should be present. (Question: are there other monads for which unsafeIOToMonad exists?)

bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () bsMapM_IO f s = unsafeUseAsCStringLen s mapp where mapp (ptr, len) = go 0 where go i | i == len = return () | otherwise = peekByteOff ptr i >>= f >> go (i+1)

A-and here’s a table comparing performance of all three functions. All timings are in milliseconds.

ghci ghc ghc -O ghc -O2 +----------+----------+----------+----------+ gen | 380 | 85 | 4.1 | 4.0 | short | 45 | 46 | 17.2 | 16.5 | IO | 434 | 92 | 2.4 | 2.4 | +----------+----------+----------+----------+

Here’s the code I used. (Question: have I messed up anything?)

import qualified Data.ByteString as BS import Data.Random import System.Random import System.IO.Unsafe import Control.Monad import Data.IORef import Criterion.Main import BSMaps

--a bytestring consisting of 65536 random bytes testCase = BS.pack $ fst $ flip sampleState (mkStdGen 8) $ replicateM (2^16) stdUniform

--sums elements of a bytestring, using given mapM_ sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> BS.ByteString -> Word8 sumIO f s = unsafePerformIO $ do sm <- newIORef 0 f (modifyIORef' sm . (+)) s readIORef sm

--runs the tests main = defaultMain [ bench "IO" $ whnf (sumIO bsMapM_IO) testCase, bench "short" $ whnf (sumIO bsMapM_short) testCase, bench "gen" $ whnf (sumIO bsMapM_gen) testCase]

Finally, if there isn’t anything wrong, what are my next steps to see this included into next version of bytestring?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Artyom Kazak

6 Sep 6 Sep

1:57 p.m.

On Fri, 06 Sep 2013 03:07:23 +0400, Johan Tibell wrote:

...

I think the important issue here is whether we want to add monadic versions of functions in bytestring (where it makes sense)? It wield yield better performance than using `unpack`, but at the cost of lots of code duplication.

We have the same issue in containers where you could for example want an `updateM` function, that would let you decide whether to update a value by performing some side effect. You could simulate this using a combination of `lookup` and `insert`, but that requires twice the number of traversals of the data structure.

Right now I'm of the mind that the extra traversals (and using unpack in the case of ByteString) is better than the code duplication.

This code already is in bytestring — as part of `sort`. So, no code would actually be duplicated, just moved. (And we get sortBy practically for free!) Then, bytestring *is* code duplication. Raw loops everywhere, same raw loops with minor changes; peekByteOff-pokeByteOff; calling C for 30% speed improvements (and leaving loop-y Haskell versions commented nearby); and much more. I find myself wondering if by carefully choosing a set of basic primitives the code of bytestring could be made ten times shorter.

Duncan Coutts

11 Sep 11 Sep

1:35 p.m.

On Thu, 2013-09-05 at 18:51 -0400, Edward Kmett wrote:

...

The maintainers for bytestring are still listed as Don Stewart and Duncan Coutts on the package, and it doesn't seem to fall the list of core packages per http://www.haskell.org/haskellwiki/Library_submissions so I suppose it would come down to talking one of them into taking the patch.

Yes, I'm happy to consider pull requests. https://github.com/haskell/bytestring/ I should say that I start off sceptical about any additions to this API. As Raymond Chen says, each new feature starts off with -100 points. For mapM etc, personally I think a better solution would be if ByteString and Text and other specialised containers could be an instance of Foldable/Traversable. Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded. I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable. Duncan

Nicolas Trangez

1:56 p.m.

On Wed, 2013-09-11 at 14:35 +0100, Duncan Coutts wrote:

...

For mapM etc, personally I think a better solution would be if ByteString and Text and other specialised containers could be an instance of Foldable/Traversable.

Yes!

...

Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded.

Indeed :-( Hence my question a couple of days ago.

...

I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

I looked into this when this thread started, but bumped into at least one 'issue': when defining (OTOH) something like import qualified Data.ByteString as BS import qualified Data.ByteString.Char8 as BS8 class Foldable t where type Elem c :: * foldr :: (Elem t -> b -> b) -> b -> t -> b instance Foldable BS.ByteString where type Elem BS.ByteString = Word8 foldr = BS.foldr instance Foldable BS8.ByteString where type Elem BS8.ByteString = Char foldr = BS8.foldr which fails because BS.ByteString and BS8.ByteString are the same. Nicolas

Duncan Coutts

2:15 p.m.

On Wed, 2013-09-11 at 15:56 +0200, Nicolas Trangez wrote:

...

On Wed, 2013-09-11 at 14:35 +0100, Duncan Coutts wrote:

...
For mapM etc, personally I think a better solution would be if ByteString and Text and other specialised containers could be an instance of Foldable/Traversable.

Yes!

...
Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded.

Indeed :-( Hence my question a couple of days ago.

...
I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

I looked into this when this thread started, but bumped into at least one 'issue': when defining (OTOH) something like

import qualified Data.ByteString as BS import qualified Data.ByteString.Char8 as BS8

class Foldable t where type Elem c :: * foldr :: (Elem t -> b -> b) -> b -> t -> b

instance Foldable BS.ByteString where type Elem BS.ByteString = Word8 foldr = BS.foldr

instance Foldable BS8.ByteString where type Elem BS8.ByteString = Char foldr = BS8.foldr

which fails because BS.ByteString and BS8.ByteString are the same.

Right. I fear there's very little one could do about that. We do have one single type that is a specialised container for two types. That's fairly unusual. I'd probably just use Word8, especially since Char stuff is a bit iffy since it's really "Char8" not a full Char. Duncan

Johan Tibell

3:33 p.m.

On Wed, Sep 11, 2013 at 7:15 AM, Duncan Coutts

...

wrote:

...

On Wed, 2013-09-11 at 15:56 +0200, Nicolas Trangez wrote:

...
I looked into this when this thread started, but bumped into at least one 'issue': when defining (OTOH) something like

import qualified Data.ByteString as BS import qualified Data.ByteString.Char8 as BS8

class Foldable t where type Elem c :: * foldr :: (Elem t -> b -> b) -> b -> t -> b

instance Foldable BS.ByteString where type Elem BS.ByteString = Word8 foldr = BS.foldr

instance Foldable BS8.ByteString where type Elem BS8.ByteString = Char foldr = BS8.foldr

which fails because BS.ByteString and BS8.ByteString are the same.

Right. I fear there's very little one could do about that. We do have one single type that is a specialised container for two types. That's fairly unusual. I'd probably just use Word8, especially since Char stuff is a bit iffy since it's really "Char8" not a full Char.

It happens in containers and unordered-containers as well and I suspect we will see it any time we want both a lazy and strict (in the container sense) version of a data type. -- Johan

Edward Kmett

2:28 p.m.

We can't just upgrade those classes to be type family based, as this destroys their support for polymorphic recursion. We could write another class though. The question is if that class belongs in base or in some other package, as it takes is pretty far afield of anything purporting to be portable. On Wed, Sep 11, 2013 at 9:35 AM, Duncan Coutts

...

wrote:

...

On Thu, 2013-09-05 at 18:51 -0400, Edward Kmett wrote:

...
The maintainers for bytestring are still listed as Don Stewart and Duncan Coutts on the package, and it doesn't seem to fall the list of core packages per http://www.haskell.org/haskellwiki/Library_submissions so I suppose it would come down to talking one of them into taking the patch.

Yes, I'm happy to consider pull requests.

https://github.com/haskell/bytestring/

I should say that I start off sceptical about any additions to this API. As Raymond Chen says, each new feature starts off with -100 points.

For mapM etc, personally I think a better solution would be if ByteString and Text and other specialised containers could be an instance of Foldable/Traversable. Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded. I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

Duncan

Mario Blažević

13 Sep 13 Sep

2:55 a.m.

On 09/11/13 10:28, Edward Kmett wrote:

...

We can't just upgrade those classes to be type family based, as this destroys their support for polymorphic recursion.

We could write another class though.

The question is if that class belongs in base or in some other package, as it takes is pretty far afield of anything purporting to be portable.

Not necessarily. If you get rid of the item type, the resulting class is standard Haskell: class Monoid m => FactorialMonoid m where foldMap :: (FactorialMonoid m, Monoid n) => (m -> n) -> m -> n With this approach you're folding over the atomic factors of the ByteString, i.e., ByteStrings containing 1 byte each, rather than the bytes themselves. The same interface covers strict and lazy ByteString and ByteString.Char8 as well. Yes, abstracting away the item type would have some performance impact, as foldMap would need to construct the singleton ByteStrings. I'm hoping they could be deforested away.

Henning Thielemann

11 Sep 11 Sep

6:25 p.m.

On Wed, 11 Sep 2013, Duncan Coutts wrote:

...

For mapM etc, personally I think a better solution would be if ByteString and Text and other specialised containers could be an instance of Foldable/Traversable. Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded. I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

I'd prefer to keep bytestring simple with respect to the number of type extensions. Since you must implement ByteString.mapM anyway, you can plug this into an instance definition of Traversable ByteString.

Edward Kmett

6:29 p.m.

mapM_ is actually implemented in terms of Foldable, not Traversable, and its implementation in terms of folding a ByteString is actually rather slow in my experience doing so inside lens and isn't much faster than the naive version that was suggested at the start of this discussion. But as we're not monomorphizing Foldable/Traversable, this isn't a think that is able to happen anyways. -Edward On Wed, Sep 11, 2013 at 2:25 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:

...

On Wed, 11 Sep 2013, Duncan Coutts wrote:

For mapM etc, personally I think a better solution would be if

...
ByteString and Text and other specialised containers could be an instance of Foldable/Traversable. Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded. I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

I'd prefer to keep bytestring simple with respect to the number of type extensions. Since you must implement ByteString.mapM anyway, you can plug this into an instance definition of Traversable ByteString.

John Lato

8:05 p.m.

I agree with everything Edward has said already. I went through a similar chain of reasoning a few years ago when I started using ListLike, which provides a FoldableLL class (although it uses fundeps as ListLike predates type families). ByteString can't be a Foldable instance, nor do I think most people would want it to be. Even though I would also like to see mapM_ in bytestring, it's probably faster to have a library with a separate monomorphic Foldable class. So I just wrote one: https://github.com/JohnLato/mono-foldable http://hackage.haskell.org/package/mono-foldable Petr Pudlak has done some work in this area. A big problem is that foldM/mapM_ are typically implemented in terms of Foldable.foldr (or FoldableLL), but this isn't always optimal for performance. They really need to be part of the type class so that different container types can have specialized implementations. I did that in mono-foldable, using Artyom's map implementation (Artyom, please let me know if you object to this!) pull requests, forks, etc all welcome. John L. On Wed, Sep 11, 2013 at 1:29 PM, Edward Kmett wrote:

...

mapM_ is actually implemented in terms of Foldable, not Traversable, and its implementation in terms of folding a ByteString is actually rather slow in my experience doing so inside lens and isn't much faster than the naive version that was suggested at the start of this discussion.

But as we're not monomorphizing Foldable/Traversable, this isn't a think that is able to happen anyways.

-Edward

On Wed, Sep 11, 2013 at 2:25 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:

...
On Wed, 11 Sep 2013, Duncan Coutts wrote:

For mapM etc, personally I think a better solution would be if

...
ByteString and Text and other specialised containers could be an instance of Foldable/Traversable. Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded. I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

I'd prefer to keep bytestring simple with respect to the number of type extensions. Since you must implement ByteString.mapM anyway, you can plug this into an instance definition of Traversable ByteString.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Edward Kmett

9:03 p.m.

Actually you can implement them in terms of foldMap and they suffer somewhat less and become slightly more defined for infinite traversals but there is still some suffering to be had performance wise. Snoyman has also been pursuing a monomorphic Foldable class, and there is one (more or less) in both my monoids and reducers packages, so this is an oft reinvented idea. ;) On Wed, Sep 11, 2013 at 4:05 PM, John Lato wrote:

...

I agree with everything Edward has said already. I went through a similar chain of reasoning a few years ago when I started using ListLike, which provides a FoldableLL class (although it uses fundeps as ListLike predates type families). ByteString can't be a Foldable instance, nor do I think most people would want it to be.

Even though I would also like to see mapM_ in bytestring, it's probably faster to have a library with a separate monomorphic Foldable class. So I just wrote one:

https://github.com/JohnLato/mono-foldable http://hackage.haskell.org/package/mono-foldable

Petr Pudlak has done some work in this area. A big problem is that foldM/mapM_ are typically implemented in terms of Foldable.foldr (or FoldableLL), but this isn't always optimal for performance. They really need to be part of the type class so that different container types can have specialized implementations. I did that in mono-foldable, using Artyom's map implementation (Artyom, please let me know if you object to this!)

pull requests, forks, etc all welcome.

John L.

On Wed, Sep 11, 2013 at 1:29 PM, Edward Kmett wrote:

...
mapM_ is actually implemented in terms of Foldable, not Traversable, and its implementation in terms of folding a ByteString is actually rather slow in my experience doing so inside lens and isn't much faster than the naive version that was suggested at the start of this discussion.

But as we're not monomorphizing Foldable/Traversable, this isn't a think that is able to happen anyways.

-Edward

On Wed, Sep 11, 2013 at 2:25 PM, Henning Thielemann < lemming@henning-thielemann.de> wrote:

...
On Wed, 11 Sep 2013, Duncan Coutts wrote:

For mapM etc, personally I think a better solution would be if

...
ByteString and Text and other specialised containers could be an instance of Foldable/Traversable. Those classes define mapM etc but currently they only work for containers that are polymorphic in their elements, so all specialised containers are excluded. I'm sure there must be a solution to that (I'd guess with type families) and that would be much nicer than adding mapM etc to bytestring itself. We would then just provide efficient instances for Foldable/Traversable.

I'd prefer to keep bytestring simple with respect to the number of type extensions. Since you must implement ByteString.mapM anyway, you can plug this into an instance definition of Traversable ByteString.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Simon Peyton-Jones

6 Sep 6 Sep

7:11 a.m.

| The overhead | seems to be coming from inlinePerformIO (am I right here? Also, am Why do you need all this 'unsafe' stuff? I think because bytestrings work internally using side effects. But why can't you use runST? If for some reason you really need IO, it's presumably not because you are really doing I/O. It's a kind of specialised application. The overheads from unsafePerformIO and friends are partly related (I think) to the bad IO-ish things that might happen inside. So perhaps you (plural) can look carefully at what you are really trying to do, and propose new primpops or whatever to support it at low cost. Don't give *me* the answers! I'm just pointing out that there is no fundamental reason for this to run slower than it would in C, and if it does it might be worth digging a bit. Simon | -----Original Message----- | From: Libraries [mailto:libraries-bounces@haskell.org] On Behalf Of | Artyom Kazak | Sent: 05 September 2013 21:38 | To: libraries@haskell.org | Subject: Re: mapM_ for bytestring | | So, I have written several implementations of mapM_: | * bsMapM_gen — generic, works for any monad | * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) | * bsMapM_IO — hand-written version specifically for IO | | Generic and hand-written versions don’t differ much. The overhead | seems to be coming from inlinePerformIO (am I right here? Also, am | I using inlinePerformIO legitimately?), which is only needed when | we’re not in the IO monad. | | {-# SPECIALISE ... IO #-} | {-# SPECIALISE ... ST #-} | bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () | bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp | where | mapp (ptr, len) = return $ go 0 | where | go i | i == len = return () | | otherwise = let !b = inlinePerformIO $ | peekByteOff ptr i | in f b >> go (i+1) | | The short version relies on fusion of `unpack` and `mapM_`. Its | advantage is that even when compiled without optimisations, it’s | still fast. (Question: would the same happen to other versions, | when put into Data.ByteString module? I suppose packages like | bytestring are compiled with optimisations, so it probably would.) | | {-# SPECIALISE ... IO #-} | {-# SPECIALISE ... ST #-} | bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () | bsMapM_shortIO f s = mapM_ f (unpack s) | | Finally, the IO-specialised version. It’s faster than generic | version (and, similarly, an ST-specialised version using | unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma | involving bsMapM_IO and bsMapM_ST should be present. | (Question: are there other monads for which unsafeIOToMonad exists?) | | bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () | bsMapM_IO f s = unsafeUseAsCStringLen s mapp | where | mapp (ptr, len) = go 0 | where | go i | i == len = return () | | otherwise = peekByteOff ptr i >>= f >> go (i+1) | | A-and here’s a table comparing performance of all three functions. | All timings are in milliseconds. | | ghci ghc ghc -O ghc -O2 | +----------+----------+----------+----------+ | gen | 380 | 85 | 4.1 | 4.0 | | short | 45 | 46 | 17.2 | 16.5 | | IO | 434 | 92 | 2.4 | 2.4 | | +----------+----------+----------+----------+ | | Here’s the code I used. (Question: have I messed up anything?) | | import qualified Data.ByteString as BS | import Data.Random | import System.Random | import System.IO.Unsafe | import Control.Monad | import Data.IORef | import Criterion.Main | import BSMaps | | --a bytestring consisting of 65536 random bytes | testCase = BS.pack $ fst $ | flip sampleState (mkStdGen 8) $ | replicateM (2^16) stdUniform | | --sums elements of a bytestring, using given mapM_ | sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> | BS.ByteString -> Word8 | sumIO f s = unsafePerformIO $ do | sm <- newIORef 0 | f (modifyIORef' sm . (+)) s | readIORef sm | | --runs the tests | main = defaultMain [ | bench "IO" $ whnf (sumIO bsMapM_IO) testCase, | bench "short" $ whnf (sumIO bsMapM_short) testCase, | bench "gen" $ whnf (sumIO bsMapM_gen) testCase] | | Finally, if there isn’t anything wrong, what are my next steps to see | this included into next version of bytestring? | | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries

Edward Kmett

2:17 p.m.

ByteString itself works through a ForeignPtr, rather than a ByteArray# and it does a lot of 'inlinePerformIO' tricks to avoid the MVar implicit in a full unsafePerformIO as well as eke out some extra performance benefits over and above unsafeDupablePerformIO that I forget the details of. Text, which came later on the other hand does work under ByteArray# and MutableByteArray# s in a much more principled fashion. There is, however, a case for keeping some ability to work with ForeignPtr's in a ByteString style, because there are a lot of tricks that live on the ByteString stack for doing things like mmap'ing that can't work with the ByteArray# approach. -Edward On Fri, Sep 6, 2013 at 3:11 AM, Simon Peyton-Jones wrote:

...

| The overhead | seems to be coming from inlinePerformIO (am I right here? Also, am

Why do you need all this 'unsafe' stuff? I think because bytestrings work internally using side effects. But why can't you use runST?

If for some reason you really need IO, it's presumably not because you are really doing I/O. It's a kind of specialised application. The overheads from unsafePerformIO and friends are partly related (I think) to the bad IO-ish things that might happen inside.

So perhaps you (plural) can look carefully at what you are really trying to do, and propose new primpops or whatever to support it at low cost.

Don't give *me* the answers! I'm just pointing out that there is no fundamental reason for this to run slower than it would in C, and if it does it might be worth digging a bit.

Simon

| -----Original Message----- | From: Libraries [mailto:libraries-bounces@haskell.org] On Behalf Of | Artyom Kazak | Sent: 05 September 2013 21:38 | To: libraries@haskell.org | Subject: Re: mapM_ for bytestring | | So, I have written several implementations of mapM_: | * bsMapM_gen — generic, works for any monad | * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s) | * bsMapM_IO — hand-written version specifically for IO | | Generic and hand-written versions don’t differ much. The overhead | seems to be coming from inlinePerformIO (am I right here? Also, am | I using inlinePerformIO legitimately?), which is only needed when | we’re not in the IO monad. | | {-# SPECIALISE ... IO #-} | {-# SPECIALISE ... ST #-} | bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m () | bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp | where | mapp (ptr, len) = return $ go 0 | where | go i | i == len = return () | | otherwise = let !b = inlinePerformIO $ | peekByteOff ptr i | in f b >> go (i+1) | | The short version relies on fusion of `unpack` and `mapM_`. Its | advantage is that even when compiled without optimisations, it’s | still fast. (Question: would the same happen to other versions, | when put into Data.ByteString module? I suppose packages like | bytestring are compiled with optimisations, so it probably would.) | | {-# SPECIALISE ... IO #-} | {-# SPECIALISE ... ST #-} | bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO () | bsMapM_shortIO f s = mapM_ f (unpack s) | | Finally, the IO-specialised version. It’s faster than generic | version (and, similarly, an ST-specialised version using | unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma | involving bsMapM_IO and bsMapM_ST should be present. | (Question: are there other monads for which unsafeIOToMonad exists?) | | bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO () | bsMapM_IO f s = unsafeUseAsCStringLen s mapp | where | mapp (ptr, len) = go 0 | where | go i | i == len = return () | | otherwise = peekByteOff ptr i >>= f >> go (i+1) | | A-and here’s a table comparing performance of all three functions. | All timings are in milliseconds. | | ghci ghc ghc -O ghc -O2 | +----------+----------+----------+----------+ | gen | 380 | 85 | 4.1 | 4.0 | | short | 45 | 46 | 17.2 | 16.5 | | IO | 434 | 92 | 2.4 | 2.4 | | +----------+----------+----------+----------+ | | Here’s the code I used. (Question: have I messed up anything?) | | import qualified Data.ByteString as BS | import Data.Random | import System.Random | import System.IO.Unsafe | import Control.Monad | import Data.IORef | import Criterion.Main | import BSMaps | | --a bytestring consisting of 65536 random bytes | testCase = BS.pack $ fst $ | flip sampleState (mkStdGen 8) $ | replicateM (2^16) stdUniform | | --sums elements of a bytestring, using given mapM_ | sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) -> | BS.ByteString -> Word8 | sumIO f s = unsafePerformIO $ do | sm <- newIORef 0 | f (modifyIORef' sm . (+)) s | readIORef sm | | --runs the tests | main = defaultMain [ | bench "IO" $ whnf (sumIO bsMapM_IO) testCase, | bench "short" $ whnf (sumIO bsMapM_short) testCase, | bench "gen" $ whnf (sumIO bsMapM_gen) testCase] | | Finally, if there isn’t anything wrong, what are my next steps to see | this included into next version of bytestring? | | _______________________________________________ | Libraries mailing list | Libraries@haskell.org | http://www.haskell.org/mailman/listinfo/libraries _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

4313

Age (days ago)

4325

Last active (days ago)

List overview

Download

30 comments

12 participants

participants (12)

Artyom Kazak
Carter Schonwald
Duncan Coutts
Edward Kmett
Erik de Castro Lopo
Henning Thielemann
Johan Tibell
John Lato
Mario Blažević
Nicolas Trangez
Shachaf Ben-Kiki
Simon Peyton-Jones