
On Mon, Jan 30, 2012 at 6:21 AM, Herbert Valerio Riedel
On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote:
So maybe also the JSON parsing library kept too many unevaluated things in memory. So I could start either writing my own JSON parsing library (being more strict)
Jfyi, aeson has been added strict parser variants json' and value' some time ago, so you shouldn't have to write your own stricter JSON parsing library...
Also, besides using those variants, you may also use the attoparsec-conduit library [1]. If you have processJson :: Value -> IO X then you'd need just something like import Data.Aeson (Value, json') import Data.Attoparsec.Char8 (isSpace_w8) import qualified Data.ByteString as B import qualified Data.Conduit as C import qualified Data.Conduit.Attoparsec as CA import qualified Data.Conduit.Binary as CB import qualified Data.Conduit.List as CL main = do ... ret <- forM_ fileList $ \fp -> do C.runResourceT $ CB.sourceFile fp C.$= jsonLines C.$= CL.mapM processJson C.$$ CL.consume print ret jsonLines :: C.Resource m => C.Conduit B.ByteString m Value jsonLines = C.sequenceSink () $ do val <- CA.sinkParser json' CB.dropWhile isSpace_w8 return $ C.Emit () [val] This code is extremely resource-friendly, since (a) you can't leak file descriptors and (b) you just have to make sure that processJson function isn't too lazy. It should be quite fast as well. Cheers! =) [1] http://hackage.haskell.org/package/attoparsec-conduit -- Felipe.