
jeff:
I switched to Data.Binary, which dropped me from 2.6GB to 1.5GB, and then I switched this afternoon to unboxed arrays from lists of floats, and that dropped me again from 1.5GB to 475MB. I think, all told, that I'm in an acceptable range now, and thank you for pointing out the library mistake. I'm also down from 1.5 minutes load time to under 10 seconds of load time, which is also very very nice. Incidentally, the code I'm now using is:
binaryLoadDocumentCoordinates :: String -> IO (Ptr Float, Array.UArray Int Int) binaryLoadDocumentCoordinates path = do putStrLn "File opened" coordinates <- decodeFile (path ++ "/Clusters.bin") :: IO (Array.UArray Int Float) print . Array.bounds $ coordinates putStrLn "Got coordinates" galaxies <- decodeFile (path ++ "/Galaxies.bin") :: IO (Array.UArray Int Int) putStrLn "Got galaxies" coordinatesArr <- mallocArray . snd . Array.bounds $ coordinates putStrLn "Allocated array" pokeArray coordinatesArr . Array.elems $ coordinates return (coordinatesArr, galaxies)
binarySaveDocumentCoordinates :: String -> [Point] -> IO () binarySaveDocumentCoordinates path points = do let len = length points encodeFile (path ++ "Clusters.bin") . (Array.listArray (0,len*3) :: [Float] -> Array.UArray Int Float) . coordinateList . solve $ points encodeFile (path ++ "Galaxies.bin") . (Array.listArray (0,len) :: [Int] -> Array.UArray Int Int) . galaxyList $ points
I've just pushed a patch to Data.Binary in the darcs version that should help serialising arrays by avoiding forcing an intermediate list. You can get that here: darcs get http://darcs.haskell.org/binary I'd still avoid that 'listArray' call though, you may as well just write the list out, rather than packing it into an array, and then serialising the array back as a list. -- Don