
Hi, I played a bit around with the nice bytestring package. At some point I implemented a simple sorting program, because I needed line-sorting a file with a custom line-compare function. I was a bit surprised, that the resulting code is very fast. A lot of faster than sorting via GNU sort (with the standard line-compare function). Then I got suspicious and tested standard GNU sort against a trivial standard sort implementation in Haskell using bytestring. The Haskell implementation looks like:
module Main where
import qualified Data.ByteString.Lazy.Char8 as B import Data.List
main = do c <- B.getContents let l = B.lines c let r = sort l B.putStr $ B.unlines r
Sorting a file of '[]'-strings I get: time ./sort < brackets > /dev/null # ~ 0.01 s time sort < brackets > /dev/null # ~ 14 s Ok, strane ... Well, let's test with some 'normal' text: time ./sort < bible > /dev/null # ~ 0.4 s time sort < bible > /dev/null # ~ 0.56 s Ok, not that different. But with Haskell you often expect to get very slow code compared to an implementation in C. And I am surprised, that the Haskell is fast _and_ nice to read - because for example the ultra fast 'wc -l' Haskell implementation from the Haskell-Wiki uses some insider-knowledge about bytestring, and looks to a beginner not that intuitive, I guess. ./sort is the shown Haskell implementation. I used ghc 6.8.2, installed bytestring 0.9.1.0 as a user-local package (don't now I this superseeds the global bytestring package), compiled via ghc -O2. The tests are run at a Pentium M 1.3 GHz computer. As GNU sort I tested the version from coreutils 6.9. You can get the test files from: http://www.techfak.uni-bielefeld.de/~gsauthof/stuff/brackets.gz http://www.techfak.uni-bielefeld.de/~gsauthof/stuff/bible.gz (obviously, you have to gunzip them after downloading ...) Of course, the naive Haskell implementation doesn't care about locales (i.e. collating locale sequences), but this shouldn't explain the observed differences sorting the brackets file. Best regards Georg Sauthoff -- Fortune : 'Real programmers don't comment their code. It was hard to write, it should be hard to understand.' ;)