
BOS:
Well, your benchmarks are highly suspect.
Attached is another benchmark with similar results. This is no criterion benchmark but I did try to adjust a wee bit for cache issues. Suffice to say I am not yet impressed with Data.Text performance wise. In the broader scope I feel there is a deeper point here: Data.ByteString is how most data is available in a program (be it from a file, network, or other library/device). With all this data we really should have some sort of zero-copy "safeCoerce" that can expose the bytestring as an O(1) unboxed array of some safe length. I know this would have been nice for pureMD5, which did use an ugly unsafePerformIO hack just to get Word16s but even those got boxed up at horrible cost (it now uses 'cereal' to get the Word16 - even worse performance but it lets me ignore endianess of the architecture). ---- OT ---- Beating the dead horse: I once wrote and deleted a blog post ranting about how to get Word16 in C vs in Haskell: word = ((uint16_t *)p)[index]; or word = unsafePerformIO $ withForeignPtr ptr $ \ptr' -> let p = castPtr (plusPtr ptr' off) in peekElemOff p index That Haskell snippet takes significantly longer to both write and run. ---- END OT ---- Code and benchmark numbers included below, all complaints about the accuracy of the benchmark are welcome. NOTE: tLog is a daily #haskell log file repeated many times ~ 59MB. ------- [tommd@Mavlo Test]$ ./read tLog Normal String + System.IO "61443120": 1.467924s Data.ByteString.Lazy "61450365": 0.023285s Data.ByteString.Lazy.UTF8 "61443120": 3.305154s Data.Text.Lazy "61443120": 3.99178s ----- CODE ----- import Data.Time import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy.UTF8 as U import qualified Data.Text.Lazy as T import qualified Data.Text.Lazy.IO as TO import System.IO import System.Environment (getArgs) import Control.Monad (sequence_, when) main = do [file] <- getArgs let time (f, desc) = do s <- getCurrentTime r <- f let !r' = r t <- getCurrentTime let d = diffUTCTime t s when (length desc > 0) (putStrLn $ desc ++ " " ++ show r' ++ ": " ++ show d) ops = [ (readFile file >>= return . show . length, "") , (readFile file >>= return . show . length, "Normal String + System.IO") , (L.readFile file >>= return . show . L.length, "") , (L.readFile file >>= return . show . L.length, "Data.ByteString.Lazy") , (L.readFile file >>= return . show . U.length, "") , (L.readFile file >>= return . show . U.length, "Data.ByteString.Lazy.UTF8") , (TO.readFile file >>= return . show . T.length, "") , (TO.readFile file >>= return . show . T.length, "Data.Text.Lazy") ] mapM_ time ops