Linear time IntSet / IntMap construction from sorted input

Hi, It's always bugged me that IntSet.fromAscList is an alias for fromList, which is basically (foldr insert), even though a sorted list is (almost) already in tree order. Is there a reason for this? Or was it just that fromList is good enough? (which it is.) Assuming the latter, and somewhat bored, I wrote a function to build an IntSet from sorted input in linear time. This goes in Data.IntSet.hs : data Stack = Push {-# UNPACK #-} !Prefix !IntSet !Stack | Nada fromAsc xs = make Nada xs where make stk [] = finish stk make Nada (z:zs) = make (Push z (Tip z) Nada) zs make stk@(Push _ y Nada) (z:zs) = make (Push z (Tip z) stk) zs make stk@(Push py y stk') zs@(z:zs') = make (Push z (Tip z) (reduce (branchMask py z) stk)) zs' reduce _ stk@(Push _ x Nada) = stk reduce m stk@((Push px x (Push py y stk'))) = let !mxy = branchMask px py !pxy = mask px mxy in if shorter m mxy then reduce m (Push pxy (Bin pxy mxy y x) stk') else stk finish Nada = Nil finish (Push _ x Nada) = x finish (Push px x (Push py y stk)) = finish (Push p (join px x py y) stk) where m = branchMask px py p = mask px m Every trie node is allocated exactly once, and so the burden on the heap is significantly less, even if its only a little bit faster. I tested it by building a list from -n to n for big n, and computing the sum. It alloc's 50% less and the MUT time is like 40% less, but the GC time is more (why would that be?) so it runs in maybe 7/8ths the time, versus fromList. I have another, prettier version that uses [] instead of this hacky Stack type, but its slower, breaking even with fromList on speed, still allocating much less. Is this something you guys feel would be worth including? Should I make a proper patch? One could even use this to write an O(n) mapKeysMonotonic, bringing the IntMap and Map APIs further into alignment. Or is 8/7x speedup not worth the extra code weight? Scott

I've gone ahead and made a patch, which is attached. I don't have
access to the full test suite, but it passes the QuickCheck test for
just the prop_Ordered property, which is no longer a tautology. Who
wrote that test anyway? :)
Scott
On Mon, May 19, 2008 at 5:43 PM, Scott Dillard
Hi,
It's always bugged me that IntSet.fromAscList is an alias for fromList, which is basically (foldr insert), even though a sorted list is (almost) already in tree order. Is there a reason for this? Or was it just that fromList is good enough? (which it is.)
Assuming the latter, and somewhat bored, I wrote a function to build an IntSet from sorted input in linear time. This goes in Data.IntSet.hs :
data Stack = Push {-# UNPACK #-} !Prefix !IntSet !Stack | Nada
fromAsc xs = make Nada xs where make stk [] = finish stk make Nada (z:zs) = make (Push z (Tip z) Nada) zs make stk@(Push _ y Nada) (z:zs) = make (Push z (Tip z) stk) zs make stk@(Push py y stk') zs@(z:zs') = make (Push z (Tip z) (reduce (branchMask py z) stk)) zs'
reduce _ stk@(Push _ x Nada) = stk reduce m stk@((Push px x (Push py y stk'))) = let !mxy = branchMask px py !pxy = mask px mxy in if shorter m mxy then reduce m (Push pxy (Bin pxy mxy y x) stk') else stk
finish Nada = Nil finish (Push _ x Nada) = x finish (Push px x (Push py y stk)) = finish (Push p (join px x py y) stk) where m = branchMask px py p = mask px m
Every trie node is allocated exactly once, and so the burden on the heap is significantly less, even if its only a little bit faster. I tested it by building a list from -n to n for big n, and computing the sum. It alloc's 50% less and the MUT time is like 40% less, but the GC time is more (why would that be?) so it runs in maybe 7/8ths the time, versus fromList.
I have another, prettier version that uses [] instead of this hacky Stack type, but its slower, breaking even with fromList on speed, still allocating much less.
Is this something you guys feel would be worth including? Should I make a proper patch? One could even use this to write an O(n) mapKeysMonotonic, bringing the IntMap and Map APIs further into alignment. Or is 8/7x speedup not worth the extra code weight?
Scott

I got an off-list response from Bertram Felgenhauer, and he made some
improvements to my code. I've attached a new patch -- disregard the
old one.
Here are the timings on my 2Ghz Core Duo, for the program
main = print . IS.fold (+) 0 . IS.fromList $ [i-n|i<-[1..2*n]] where n = 5000000
original:
2,820,354,288 bytes allocated in the heap
647,827,824 bytes copied during GC (scavenged)
250,183,416 bytes copied during GC (not scavenged)
158,851,072 bytes maximum residency (9 sample(s))
5325 collections in generation 0 ( 1.53s)
9 collections in generation 1 ( 1.14s)
315 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.52s ( 2.50s elapsed)
GC time 2.67s ( 2.97s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 5.19s ( 5.47s elapsed)
%GC time 51.4% (54.3% elapsed)
Alloc rate 1,117,350,788 bytes per MUT second
Productivity 48.6% of total user, 46.1% of total elapsed
new function:
1,168,348,000 bytes allocated in the heap
682,569,432 bytes copied during GC (scavenged)
264,208,568 bytes copied during GC (not scavenged)
184,315,904 bytes maximum residency (9 sample(s))
2174 collections in generation 0 ( 1.57s)
9 collections in generation 1 ( 1.05s)
355 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.01s ( 1.02s elapsed)
GC time 2.62s ( 2.91s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 3.63s ( 3.93s elapsed)
%GC time 72.1% (73.9% elapsed)
Alloc rate 1,154,423,345 bytes per MUT second
Productivity 27.9% of total user, 25.7% of total elapsed
How many times can I reply to myself before I'm banned from the list? :)
On Tue, May 20, 2008 at 5:24 PM, Scott Dillard
I've gone ahead and made a patch, which is attached. I don't have access to the full test suite, but it passes the QuickCheck test for just the prop_Ordered property, which is no longer a tautology. Who wrote that test anyway? :)
Scott
On Mon, May 19, 2008 at 5:43 PM, Scott Dillard
wrote: Hi,
It's always bugged me that IntSet.fromAscList is an alias for fromList, which is basically (foldr insert), even though a sorted list is (almost) already in tree order. Is there a reason for this? Or was it just that fromList is good enough? (which it is.)
Assuming the latter, and somewhat bored, I wrote a function to build an IntSet from sorted input in linear time. This goes in Data.IntSet.hs :

On Wed, 2008-05-21 at 14:20 -0600, Scott Dillard wrote:
I got an off-list response from Bertram Felgenhauer, and he made some improvements to my code. I've attached a new patch -- disregard the old one.
Here are the timings on my 2Ghz Core Duo, for the program
main = print . IS.fold (+) 0 . IS.fromList $ [i-n|i<-[1..2*n]] where n = 5000000
original: 2,820,354,288 bytes allocated in the heap Total time 5.19s ( 5.47s elapsed)
new function: 1,168,348,000 bytes allocated in the heap Total time 3.63s ( 3.93s elapsed)
How many times can I reply to myself before I'm banned from the list? :)
As many as you like if you keep posting improvements like that! :-) Duncan
participants (2)
-
Duncan Coutts
-
Scott Dillard