Printing the entries of a Data.Map m - Beginners - Haskell.org

newer
Generic exception available?

Printing the entries of a Data.Map m

older
More direct binding of IO result

Manfred Lotz

15 May 2011 15 May '11

3:32 a.m.

Hi all, I have a map with some two thousand entries. Now I want to print that map. I could do: print m but I like to use my own print function: mapM_ myprint (Map.toList m) I may be wrong but having to use Map.toList looks pretty inefficient. Question: I'd like to know if there is a more efficient way to do it? -- Manfred

Reply

Sign in to reply online Use email software

Show replies by date

Matthias Guedemann

15 May 15 May

4:15 a.m.

Hi Manfred,

I may be wrong but having to use Map.toList looks pretty inefficient.

Question: I'd like to know if there is a more efficient way to do it?

I do not know if it is really more efficient (it has to consider each entry, just like converting to list does), but Data.Map is an instance of Data.Traversable, which has the traverse function, see: http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.3.1.0/Data-Trav... e.g. Prelude> :m Data.Map Data.Traversable Prelude Data.Map Data.Traversable> let mymap = insert 1 "bubu" $ insert 2 "baba" $ empty Prelude Data.Map Data.Traversable> traverse print mymap "bubu" "baba" fromList [(1,()),(2,())] Prelude Data.Map Data.Traversable> Is this what you meant? You probably want to ingnore the result value of type Prelude Data.Map Data.Traversable> :t (traverse print mymap) (traverse print mymap) :: (Num t, Ord t) => IO (Map t ()) regards Matthias

Reply

Sign in to reply online Use email software

Manfred Lotz

4:51 a.m.

Hi Matthias, On Sun, 15 May 2011 10:15:42 +0200 Matthias Guedemann wrote:

Hi Manfred,

...
I may be wrong but having to use Map.toList looks pretty inefficient.

Question: I'd like to know if there is a more efficient way to do it?

I do not know if it is really more efficient (it has to consider each entry, just like converting to list does), but Data.Map is an instance of Data.Traversable, which has the traverse function, see:

http://www.haskell.org/ghc/docs/latest/html/libraries/base-4.3.1.0/Data-Trav...

e.g.

Prelude> :m Data.Map Data.Traversable Prelude Data.Map Data.Traversable> let mymap = insert 1 "bubu" $ insert 2 "baba" $ empty Prelude Data.Map Data.Traversable> traverse print mymap "bubu" "baba" fromList [(1,()),(2,())] Prelude Data.Map Data.Traversable>

Thanks for pointing me to Traversable. Now I used Traversable.mapM which worked fine and seems to be the "traversable" counterpart of mapM_ from Control.Monad. It didn't make much of a difference in runtime doing it without toList. I'm not quite sure if it is really more efficient or not. Perhaps I would have to create dictonary with some millions entries in order to see a noticable difference? -- Manfred

Reply

Sign in to reply online Use email software

Matthias Guedemann

5:34 a.m.

Hi Manfred

It didn't make much of a difference in runtime doing it without toList. I'm not quite sure if it is really more efficient or not. Perhaps I would have to create dictonary with some millions entries in order to see a noticable difference?

Probably, Haskell is not very deterministic for memory / runtime forecasts :-) In theory there should be no difference in complexity, both need one sweep over all entries of the Map. There could perhaps be a constant cost for constructing the list. Nevertheless, laziness should make it possible to run in constant space, as neither the list, nor the transformed Map is used. So differences are probably small. But is possible, take some measurements and report, not just runtime, but also heap space usage. regards Matthias

Reply

Sign in to reply online Use email software

Daniel Fischer

8:23 a.m.

On Sunday 15 May 2011 11:34:33, Matthias Guedemann wrote:

Hi Manfred

...
It didn't make much of a difference in runtime doing it without toList. I'm not quite sure if it is really more efficient or not. Perhaps I would have to create dictonary with some millions entries in order to see a noticable difference?

Probably, Haskell is not very deterministic for memory / runtime forecasts :-)

In theory there should be no difference in complexity, both need one sweep over all entries of the Map. There could perhaps be a constant cost for constructing the list.

The call-tree for the list construction is similar to the call-tree that traverse builds. However, traverse assembles a new Map. While it's trivial to discard consumed list-cells (if the list is consumed sequentially), that is not so for Maps. Unless the compiler manages to completely eliminate the new Map, that is going to cost considerably more than the throwaway list-cells.

Nevertheless, laziness should make it possible to run in constant space, as neither the list, nor the transformed Map is used.

Well, you need the space for the initial Map, that is O(size). I assume you mean that the additional memory needed is O(1). That is almost true for going through the list, since that is swiftly collected. The call-tree to get at the elements, however, is O(log size), but that's practically constant size. For traverse, that is only true if the new Map is completely discarded (Maps are spine-strict, so if it's not completely ignored, its full spine is built, requiring O(size) space).

So differences are probably small.

But is possible, take some measurements and report, not just runtime, but also heap space usage.

Note that the "mapM_ print . toAscList" prints the (key, value) pairs while the traverse (or Data.Traversable.mapM) only prints the values. That makes a difference. I put together a small (trivial) test: viaList :: (Ord k, Show k, Show v) => Map k v -> IO () viaList = mapM_ print . toAscList -- and the same using elems instead of toAscList to only print the values perTraverse :: (Ord k, Show v) => Map k v -> IO () perTraverse mp = traverse print mp >> return () main :: IO () main = do args <- getArgs let sz = case args of (a:_) -> read a _ -> 100000 mp :: Map Int Int mp = fromDistinctAscList [(i,2*i+1) | i <- [0 .. sz]] print $ size mp -- force the spine perTraverse mp -- or via list (pairs/values only) Everything compiled with -O2 (output redirected to /dev/null), the results for input 1000000 are: ./useElems 1000000 +RTS -s 1,026,031,460 bytes allocated in the heap 247,759,432 bytes copied during GC 40,026,204 bytes maximum residency (8 sample(s)) 353,148 bytes maximum slop 103 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1869 collections, 0 parallel, 0.53s, 0.53s elapsed Generation 1: 8 collections, 0 parallel, 0.48s, 0.49s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 1.54s ( 1.55s elapsed) GC time 1.01s ( 1.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 2.56s ( 2.56s elapsed) %GC time 39.6% (39.7% elapsed) Alloc rate 664,729,202 bytes per MUT second Productivity 60.3% of total user, 60.1% of total elapsed ---------------------------------------- ./useAscList 1000000 +RTS -s 1,337,743,996 bytes allocated in the heap 247,972,620 bytes copied during GC 40,026,204 bytes maximum residency (8 sample(s)) 413,304 bytes maximum slop 103 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 2467 collections, 0 parallel, 0.53s, 0.53s elapsed Generation 1: 8 collections, 0 parallel, 0.45s, 0.45s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 2.23s ( 2.23s elapsed) GC time 0.98s ( 0.99s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.22s ( 3.22s elapsed) %GC time 30.6% (30.7% elapsed) Alloc rate 599,228,735 bytes per MUT second Productivity 69.3% of total user, 69.3% of total elapsed ---------------------------------------- ./useTraverse 1000000 +RTS -s 1,266,460,972 bytes allocated in the heap 568,553,592 bytes copied during GC 61,013,024 bytes maximum residency (10 sample(s)) 614,608 bytes maximum slop 159 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 2328 collections, 0 parallel, 1.32s, 1.32s elapsed Generation 1: 10 collections, 0 parallel, 1.18s, 1.18s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 1.82s ( 1.83s elapsed) GC time 2.50s ( 2.50s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.32s ( 4.33s elapsed) %GC time 57.8% (57.8% elapsed) Alloc rate 695,540,479 bytes per MUT second Productivity 42.1% of total user, 42.0% of total elapsed ---------------------------------------- Data.Traversable.traverse uses more memory. Not twice as much, since the original Map is disassembled while traversing it. If I force the original Map to stay intact, it's about twice. I conclude that the new Map assembled by traverse is not discarded. So, overall, going through the list is better. Cheers, Daniel

Reply

Sign in to reply online Use email software

Matthias Guedemann

9:48 a.m.

Hi Daniel,

However, traverse assembles a new Map. While it's trivial to discard consumed list-cells (if the list is consumed sequentially), that is not so for Maps. Unless the compiler manages to completely eliminate the new Map, that is going to cost considerably more than the throwaway list-cells.

good to know.

I conclude that the new Map assembled by traverse is not discarded. So, overall, going through the list is better.

Ok, now if I think about it, this seems obvious :-) Thanks for the clarification and the tests. regards Matthias

Reply

Sign in to reply online Use email software

Manfred Lotz

10:39 a.m.

On Sun, 15 May 2011 14:23:36 +0200 Daniel Fischer wrote:

On Sunday 15 May 2011 11:34:33, Matthias Guedemann wrote:

...
Hi Manfred

...
It didn't make much of a difference in runtime doing it without toList. I'm not quite sure if it is really more efficient or not. Perhaps I would have to create dictonary with some millions entries in order to see a noticable difference?

Probably, Haskell is not very deterministic for memory / runtime forecasts :-)

In theory there should be no difference in complexity, both need one sweep over all entries of the Map. There could perhaps be a constant cost for constructing the list.

The call-tree for the list construction is similar to the call-tree that traverse builds. However, traverse assembles a new Map. While it's trivial to discard consumed list-cells (if the list is consumed sequentially), that is not so for Maps. Unless the compiler manages to completely eliminate the new Map, that is going to cost considerably more than the throwaway list-cells.

...
Nevertheless, laziness should make it possible to run in constant space, as neither the list, nor the transformed Map is used.

Well, you need the space for the initial Map, that is O(size). I assume you mean that the additional memory needed is O(1). That is almost true for going through the list, since that is swiftly collected. The call-tree to get at the elements, however, is O(log size), but that's practically constant size. For traverse, that is only true if the new Map is completely discarded (Maps are spine-strict, so if it's not completely ignored, its full spine is built, requiring O(size) space).

...
So differences are probably small.

But is possible, take some measurements and report, not just runtime, but also heap space usage.

Note that the "mapM_ print . toAscList" prints the (key, value) pairs while the traverse (or Data.Traversable.mapM) only prints the values. That makes a difference.

I put together a small (trivial) test:

viaList :: (Ord k, Show k, Show v) => Map k v -> IO () viaList = mapM_ print . toAscList

I also tried something similar, and indeed you are right. mapM_ in conjunction with toList is better in terms of memory and runtime (mainly because GC is less busy) than using functions from Traversable. Really interesting. -- Manfred

Reply

Sign in to reply online Use email software

Felipe Almeida Lessa

12:49 p.m.

On Sun, May 15, 2011 at 11:39 AM, Manfred Lotz wrote:

I also tried something similar, and indeed you are right. mapM_ in conjunction with toList is better in terms of memory and runtime (mainly because GC is less busy) than using functions from Traversable.

If you want just the side effects you shouldn't be using Traversable, but Foldable. In particular [1], mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m () Doing the same test as Daniel Fischer's, but with an additional definition import qualified Data.Foldable as F perFoldable :: (Ord k, Show v) => Map k v -> IO () perFoldable = F.mapM_ print which is also the shortest definition, I get the following results: viaList: 208 MiB total memory, 2.24s MUT time, 1.11s GC time, 3.35s total time viaElems: 208 MiB total memory, 1.40s MUT time, 1.13s GC time, 2.53s total time perTraverse: 322 MiB total memory, 1.77s MUT time, 2.84s GC time, 4.61s total time perFoldable: 215 MiB total memory, 1.53s MUT time, 1.73s GC time, 3.26s total time Cheers, [1] http://hackage.haskell.org/packages/archive/base/4.3.1.0/doc/html/Data-Folda... -- Felipe.

Reply

Sign in to reply online Use email software

Daniel Fischer

2:10 p.m.

On Sunday 15 May 2011 18:49:26, Felipe Almeida Lessa wrote:

On Sun, May 15, 2011 at 11:39 AM, Manfred Lotz wrote:

...
I also tried something similar, and indeed you are right. mapM_ in conjunction with toList is better in terms of memory and runtime (mainly because GC is less busy) than using functions from Traversable.

If you want just the side effects you shouldn't be using Traversable, but Foldable. In particular [1],

mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()

Doing the same test as Daniel Fischer's, but with an additional definition

import qualified Data.Foldable as F

perFoldable :: (Ord k, Show v) => Map k v -> IO () perFoldable = F.mapM_ print

which is also the shortest definition, I get the following results:

viaList: 208 MiB total memory, 2.24s MUT time, 1.11s GC time, 3.35s total time viaElems: 208 MiB total memory, 1.40s MUT time, 1.13s GC time, 2.53s total time perTraverse: 322 MiB total memory, 1.77s MUT time, 2.84s GC time, 4.61s total time perFoldable: 215 MiB total memory, 1.53s MUT time, 1.73s GC time, 3.26s total time

I get MUT time 1.63s ( 1.63s elapsed) GC time 1.49s ( 1.49s elapsed) for that, presumably because a) your computer is faster than mine and b) my pointers are half as large as yours, so I need fewer GCs. Probably all GC times could be reduced significantly by supplying a suitable -A (with -A64M, I get 1.76 MUT, 1.03 GC). But, Data.Foldable.mapM_ is defined in terms of Data.Foldable.foldr, while Data.Map only directly defines foldMap. The default definition of Data.Foldable.foldr is defined in terms of foldMap, but of course needs some plumbing. We can eliminate that plumbing to make it a bit more memory- friendly and faster (warning: evil special-purpose code ahead): {-# LANGUAGE FlexibleInstances #-} import Data.Monoid import qualified Data.Foldable as F -- This is the evil part: instance Monoid (IO ()) where mempty = return () mappend = (>>) mconcat = sequence_ perFoldMap :: (Ord k, Show v) => Map k v -> IO () perFoldMap = F.foldMap print That clocks in at MUT time 1.57s ( 1.57s elapsed) GC time 1.24s ( 1.24s elapsed) and is clear second (1.68 + 0.80 with -A64M). So, simply getting the list of elems an printing those out is still the fastest (and note that it's trivial to print keys and values by converting the Map to a list, while doing that with Data.Traversable and Data.Foldable requires some contortions). Moral: simple is good.

Cheers,

[1] http://hackage.haskell.org/packages/archive/base/4.3.1.0/doc/html/Data- Foldable.html#v:mapM_

Reply

Sign in to reply online Use email software

Daniel Fischer

6:52 a.m.

On Sunday 15 May 2011 09:32:32, Manfred Lotz wrote:

Hi all, I have a map with some two thousand entries. Now I want to print that map.

I could do: print m

but I like to use my own print function: mapM_ myprint (Map.toList m)

I may be wrong but having to use Map.toList looks pretty inefficient.

It's pretty efficient. I'm not sure that the compiler can completely fuse away the intermediate list, but such allocations are cheap. The list is consumed as it's generated, the consumed cells are collected in the next gc - the space and time overhead should be negligible.

Question: I'd like to know if there is a more efficient way to do it?

Reply

Sign in to reply online Use email software

5119

Age (days ago)

5119

Last active (days ago)

Download

9 comments

5 participants

tags

participants (5)

Daniel Fischer
Felipe Almeida Lessa
Manfred Lotz
Matthias Guedemann
Matthias Guedemann