
Folks, I'm looking to use the following code to process a multi-GB text file. I am using ByteStrings but there was a discussion today on IRC about tail recursion, laziness and accumulators that made me wonder. Is fixLines below lazy enough? Can it be made lazier? Thanks, Joel --- module Main where import IO import System import Numeric import Data.Char import Data.Word import qualified Data.Map as M import qualified Data.ByteString.Char8 as B import Prelude hiding (lines) grabTableInfo x = (tableId', (tableType, tableStakes)) where (tableId:tableType:_:tableStakes:_) = B.split ',' x Just (tableId', _) = B.readInt tableId lines = B.split '\n' --- My Oracle ascii dump is 80 characters wide so some lines --- are split. I need to skip empty lines and join lines --- containing less than the required number of commas. fixLines 0 lines = lines fixLines _ [] = [] fixLines n (line:lines) = fixLines' lines line [] where fixLines' [] str acc | B.count ',' str == n = acc ++ [str] | otherwise = acc fixLines' (x:xs) str acc | B.null str -- skip = fixLines' xs x acc | B.count ',' str < n -- join with next line = fixLines' xs (B.append str x) acc | otherwise = fixLines' xs x (acc ++ [str]) mkMap = M.fromList . map grabTableInfo . fixLines 20 loadTableInfo = do bracket (openFile "game_info_tbl.csv" ReadMode) (hClose) (\h -> do c <- B.hGetContents h return $ mkMap $ lines c) -- http://wagerlabs.com/