
Am Dienstag 15 September 2009 22:25:31 schrieb Cristiano Paris:
On Tue, Sep 15, 2009 at 10:11 PM, Cristiano Paris
wrote: On Tue, Sep 15, 2009 at 10:08 PM, Cristiano Paris
wrote: ... So, it seems that "seq b" is completely ineffective and program is not correct.
Correction: removing "seq b" results in nothing being displayed :)
So, it's not "completely" effective. I suspect this is related to the fact that a String in Haskell is just a list of Char so we should use seq on every element of b. Let me try...
Now it works as expected:
--- module Main where
import System.IO import System.IO.Unsafe import Control.Applicative import Data.List import Data.Ord
import Debug.Trace
data Bit = Bit { index :: Integer, body :: String }
readBit fn = withFile fn ReadMode $ \h -> Bit <$> (hGetLine h >>= return . read) <*> readBody where readBody = trace "In readBody" $ withFile fn ReadMode $ \h -> do b <- hGetContents h let b' = foldr (\e a -> seq e (a ++ [e])) [] b
Aaawww. let b' = length b or let b' = foldl' seq () b or let b' = b `using` rnf if you want to force the whole file to be read. But then you should definitely be using ByteStrings.
seq b' $ return $ trace ("Read body from: " ++ fn) b'
main = do bl <- mapM readBit ["file1.txt","file2.txt"] mapM_ (putStrLn . show . index) $ sortBy (comparing index) bl putStrLn $ body $ head bl ----
Two points:
1 - I had to cut off file1.txt to be just above 1024 bytes otherwise the program becomes extremely slow even on a 100KB file with a line being output every 5 seconds and with my CPU being completely busy (I'm using a modern MacBook Pro).
2 - Omitting the last line in my program actually causes the body to be completely read even if it's not used: this is consistent with my hypotesis on seq which now works properly.
:)
Cristiano