
That helps to make things clearer, I think. One issue is the nature of Maps (strict in keys, non-strict in values).
- neither singleton nor unionWith are strict in the Map values, so nothing here forces the evaluation of rate or construction of UArr
But, as I have written, in one of my tests I also tried rnf to force evaluation: rnf v `seq` rnf m `seq` return m Isn't this sufficient?
It will force the Map that results from the repeated unioning, but does not ensure that this is done in an efficient way.
A standard trick to keep Map values evaluated by construction is to make the availability of keys dependent on their values, eg (singleton key) $! value. That won't help with unionWith and the appendUs, but it should allow the source string references to be dropped early, as the singletons are constructed.
Tried; but, even using union instead of unionWith, the memory grows fast as before.
Strange. I built myself a small wrapper to make your code fragment compilable, and just replacing (unionWith appendU) with (union) makes a drastic difference - as it should. It is rather annoying that Data.IntMap doesn't provide a strict form of unionWith or insertWith (Data.Map does at least provide insertWith'). But we can define our own, at the cost of an extra lookup. We can then foldl' that insertWith' directly over the ratings list, bypassing the non-strict parts of the Data.IntMap API (see code below). Claus (who still thinks that all Maps should be parameterized over their key-value pair type constructor, so that the default non-strict Maps would result from using non-strict pairs type IntMap = IntMapP (,) while the often desired element-strict Maps would result from using strict pairs, with no other change in API type IntMapStrict = IntMapP (:*:) ) ------------------------------------------------------- {-# LANGUAGE TypeOperators #-} import qualified Data.ByteString.Lazy as L import Data.Array.Vector import qualified Data.IntMap as IM import Data.List import Data.Word import qualified Data.ByteString.Lazy.Char8 as L8 import Data.Maybe import System.IO -- don't use this for real, test wrapper only ratings :: L.ByteString -> [(Word32,Word8)] ratings = map (\[i,r]->(fromIntegral $ fst $ fromJust $ L8.readInt i ,fromIntegral $ fst $ fromJust $ L8.readInt r)) . map L8.words . L8.lines parse handle = do contents <- L.hGetContents handle let v = map singleton' $ ratings contents let m = foldl' (\m (kw,v)->insertWith' appendU (fromIntegral kw,v) m) IM.empty v -- let m = foldl1' (IM.unionWith appendU) v -- let m = foldl1' (IM.union) v return $! m where -- Build a Map with a single movie rating singleton' :: (Word32, Word8) -> (Int,UArr Rating) singleton' (id, rate) = ((fromIntegral $ id), (singletonU $ pairS (id, rate))) -- (IM.singleton (fromIntegral $ id)) $ (singletonU $ pairS (id, rate)) insertWith' op (k,v) m = maybe (IM.insert k v m) (\old->((IM.insert k) $! (v `op` old)) m) (IM.lookup k m) type Rating = Word32 :*: Word8 type MovieRatings = IM.IntMap (UArr Rating) -- UArr from uvector -- more test wrapper, some trivial input data generate = withFile "in.data" WriteMode $ \h-> mapM_ (\(i,r)->hPutStrLn h $ show i++" "++show r) $ take 1000000 $ cycle [(i,i)|i<-[0..100]] main = withFile "in.data" ReadMode parse >>= print