poor performance when generating random text

Hello anyone I've written a snippet which generates a file full of random strings. When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per second which is on par with interpreted php. That's the fact I find rather disappointing. Maybe I've missed something trivial? Any suggestions and explanations are welcome. :) % cat ext_sort.hs import qualified Data.Text as T import System.Random import Control.Exception import Control.Monad import System.IO import qualified Data.Text.IO as TI gen_string g = let (len, g') = randomR (50, 450) g in T.unfoldrN len rand_text (len, g') where rand_text (0,_) = Nothing rand_text (k,g) = let (c, g') = randomR ('a','z') g in Just (c, ((k-1), g')) write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do let size = 100000 sequence $ replicate size $ do g <- newStdGen let text = gen_string g TI.hPutStrLn h text main = do putStrLn "generating text corpus" write_corpus "test.txt" % cat ext_sort.prof Wed Oct 17 10:59 2012 Time and Allocation Profiling Report (Final) ext_sort +RTS -p -RTS total time = 32.56 secs (32558 ticks @ 1000 us, 1 processor) total alloc = 12,742,917,332 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc gen_string.rand_text.(...) Main 70.7 69.8 gen_string Main 17.6 15.8 gen_string.rand_text Main 5.4 13.3 write_corpus.\ Main 4.3 0.8 individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 67 0 0.0 0.0 100.0 100.0 main Main 135 0 0.0 0.0 100.0 100.0 write_corpus Main 137 0 0.0 0.0 100.0 100.0 write_corpus.\ Main 138 1 4.3 0.8 100.0 100.0 write_corpus.\.text Main 140 100000 0.0 0.0 95.7 99.2 gen_string Main 141 100000 17.6 15.8 95.7 99.2 gen_string.g' Main 147 100000 0.0 0.0 0.0 0.0 gen_string.rand_text Main 144 25109743 5.4 13.3 77.5 83.2 gen_string.rand_text.g' Main 148 24909743 0.6 0.0 0.6 0.0 gen_string.rand_text.(...) Main 146 25009743 70.7 69.8 70.7 69.8 gen_string.rand_text.c Main 145 25009743 0.8 0.0 0.8 0.0 gen_string.len Main 143 100000 0.0 0.0 0.0 0.0 gen_string.(...) Main 142 100000 0.6 0.3 0.6 0.3

System.Random is very slow. Try the mwc-random package from Hackage.
On Wed, Oct 17, 2012 at 9:07 AM, Dmitry Vyal
Hello anyone
I've written a snippet which generates a file full of random strings. When compiled with -O2 on ghc-7.6, the generation speed is about 2Mb per second which is on par with interpreted php. That's the fact I find rather disappointing. Maybe I've missed something trivial? Any suggestions and explanations are welcome. :)
% cat ext_sort.hs import qualified Data.Text as T import System.Random import Control.Exception import Control.Monad
import System.IO import qualified Data.Text.IO as TI
gen_string g = let (len, g') = randomR (50, 450) g in T.unfoldrN len rand_text (len, g') where rand_text (0,_) = Nothing rand_text (k,g) = let (c, g') = randomR ('a','z') g in Just (c, ((k-1), g'))
write_corpus file = bracket (openFile file WriteMode) hClose $ \h -> do let size = 100000 sequence $ replicate size $ do g <- newStdGen let text = gen_string g TI.hPutStrLn h text
main = do putStrLn "generating text corpus" write_corpus "test.txt"
% cat ext_sort.prof Wed Oct 17 10:59 2012 Time and Allocation Profiling Report (Final)
ext_sort +RTS -p -RTS
total time = 32.56 secs (32558 ticks @ 1000 us, 1 processor) total alloc = 12,742,917,332 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
gen_string.rand_text.(...) Main 70.7 69.8 gen_string Main 17.6 15.8 gen_string.rand_text Main 5.4 13.3 write_corpus.\ Main 4.3 0.8
individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 67 0 0.0 0.0 100.0 100.0 main Main 135 0 0.0 0.0 100.0 100.0 write_corpus Main 137 0 0.0 0.0 100.0 100.0 write_corpus.\ Main 138 1 4.3 0.8 100.0 100.0 write_corpus.\.text Main 140 100000 0.0 0.0 95.7 99.2 gen_string Main 141 100000 17.6 15.8 95.7 99.2 gen_string.g' Main 147 100000 0.0 0.0 0.0 0.0 gen_string.rand_text Main 144 25109743 5.4 13.3 77.5 83.2 gen_string.rand_text.g' Main 148 24909743 0.6 0.0 0.6 0.0 gen_string.rand_text.(...) Main 146 25009743 70.7 69.8 70.7 69.8 gen_string.rand_text.c Main 145 25009743 0.8 0.0 0.8 0.0 gen_string.len Main 143 100000 0.0 0.0 0.0 0.0 gen_string.(...) Main 142 100000 0.6 0.3 0.6 0.3
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe
--
Gregory Collins

What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though: import qualified Data.Text as T import System.Random.MWC import Control.Monad import System.IO import Data.ByteString as B import Data.Word (Word8) import Data.ByteString.Char8 as CB {- | Converts a Char to a Word8. Took from MissingH -} c2w8 :: Char -> Word8 c2w8 = fromIntegral . fromEnum charRangeStart :: Word8 charRangeStart = c2w8 'a' {-# INLINE charRangeStart #-} charRangeEnd :: Word8 charRangeEnd = c2w8 'z' {-# INLINE charRangeEnd #-} --genString :: Gen RealWorld -> IO B.ByteString genString g = do randomLen <- uniformR (50 :: Int, 450 :: Int) g str <- replicateM randomLen $ uniformR (charRangeStart, charRangeEnd) g return $ B.pack str writeCorpus :: FilePath -> IO [()] writeCorpus file = withFile file WriteMode $ \h -> do let size = 100000 _ <- withSystemRandom $ \gen -> replicateM size $ do text <- genString gen :: IO B.ByteString CB.hPutStrLn h text return [()] main :: IO [()] main = writeCorpus "test.txt" A.

On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though:
What about this? I've tested on my pc and seems pretty fast. The
trick is to generate the gen only once. Not sure if the inlines helps, though ... Wow, haskell-cafe is a wonderful place! In just a two hours program run time automagically improved 20x ;) Thanks Alfredo, code works wonderful. Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop. Interesting, how it compares to C now. Inlining makes about 50x difference when code compiled without optimization. A nice example. Best wishes, Dmitry

Glad to have been helpful :)
Bests,
Alfredo
Sent from my iPad
On 17/ott/2012, at 21:10, Dmitry Vyal
On 10/17/2012 12:45 PM, Alfredo Di Napoli wrote:
What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though:
What about this? I've tested on my pc and seems pretty fast. The trick is to generate the gen only once. Not sure if the inlines helps, though ...
Wow, haskell-cafe is a wonderful place! In just a two hours program run time automagically improved 20x ;) Thanks Alfredo, code works wonderful. Compared to mine implementation it's 2.5 sec vs 50 sec on my laptop. Interesting, how it compares to C now.
Inlining makes about 50x difference when code compiled without optimization. A nice example.
Best wishes, Dmitry
participants (3)
-
Alfredo Di Napoli
-
Dmitry Vyal
-
Gregory Collins