Haskell-Cafe

Download

haskell-cafe@haskell.org

June 2006

128 participants
112 discussions

Re[2]: [Haskell] My summer of code project: HsJudy
by Bulat Ziganshin 01 Jun '06

01 Jun '06

(moved to haskell-cafe) Hello Jean-Philippe, Tuesday, May 30, 2006, 2:58:01 PM, you wrote: >> you should also see to the Collections package: >> darcs get --partial http://darcs.haskell.org/packages/collections/ >> although it contains only _immutable_ datastructures at this >> moment. may be, you can codevelop with Bernardy interface for mutable >> maps/sets > I'm not sure there is significant overlap between what we're doing. > I'd be delighted to cooperate if you think this is possible though. well, what i think: while all the actual work will be performed by Caio himself, we can help him (and to ourselves as possible future users of this lib) by describing infrastructure in that he need to insert his work and pointing to "adjancent" projects so, Caio, now i write for you, better organizing my previous post: 1. In terms of Haskell, Judy is a library of _mutable_ collections of _unboxed_ elements. i pointed you to the Array wiki page, where differences between boxed and unboxed, mutable and immutable datastructures are described 2. Collections is a library of _immutable_ datastructures, and Jean-Philippe especially omitted any support for mutable datastructures - because it's the whole story of it's own. So, you don't need to write your work as part of the Collections library 3. Nevertheless, Collections has GREAT organization and it will be very helpful to study it. in particular, see at the Data.Collections module, what describes various classes into which all collections are fitted, and hierarchy between them 4. Next, when you will search for the datatypes/classes whose interface you can imitate, you will find only Data.HashTable - other datatypes in standard libraries are either immutable (say, Map) or need contiguous indexes (say, MArray). So you will need to create your own interface and here you can borrow from the Collections lib. Just see at the differences between IArray and MArray classes. They are very close, only MArray class update arrays on-the-place and has additional 'Monad m' on all it's return types. say, class IArray a e where unsafeAt :: (Index i) => a i e -> i -> e becomes class MArray a e m where unsafeRead :: (Index i) => a i e -> i -> m e You can do the same with Collections classes. instead of reinventing the wheel, you can add letter 'M' to it :) say, class Monoid c => Map c k a | c -> k a where insert :: k -> a -> c -> c delete :: k -> c -> c member :: k -> c -> Bool would become class (Monoid c, Monad m) => MapM c k a | c -> k a where insertM :: k -> a -> c -> m () deleteM :: k -> c -> m () memberM :: k -> c -> m Bool 5. as i already said, then you should use StablePtr and any form of hashing/seriazliation in order to map between boxed (Haskell) and unboxed (C) values. and then you can develop series of Diff transformers what emulates immutable collections via mutable ones (say interface of class Map via the interface of class MapM), DiffArray can serve as template here 6. There is no class framework for container types other than this Data.Collections module, so implementing it's close imitation (with monadic operations) will help future users to easily learn and use both interfaces -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

3 4

sum-file shootout test
by Bulat Ziganshin 01 Jun '06

01 Jun '06

Hello Haskell-cafe, i have analyzed performance of various sum-file implementations (see http://shootout.alioth.debian.org/debian/benchmark.php?test=sumcol&lang=all ) first, one-liner implementation (attached as b.hs) works about 500 seconds on my cpu. it can be made 10x faster just by using it's own 'read' procedure (c.hs). this tells us that current 'read' implementation in GHC is VERY SLOW. next step to make things go faster you can see at http://shootout.alioth.debian.org/debian/benchmark.php?test=sumcol&lang=ghc… - now it's the fastest GHC entry on this test. speedup is a result of throwing away all the lines/map/read/sum individual procedures and writing entire algorithm over the plain stream of Chars. this allows us to omit all the construction/deconstruction of lazy boxes, so this program works about 10 seconds on my box. now the main problem is getContents itself, which uses about 70% of total time, because it requires to construct/deconstruct lazy box for each Char read. Using Streams library, we can omit this work and use straightforward imperative code (h.hs). this program works 4 times faster (2.8 seconds) than faster Haskell variant, what should be about 1.5 times faster than today's fastest (D Digital Mars) entry in this test i think that close speed can be also obtained by using line-oriented I/O in Streams+ByteString combination and then applying some "readInt :: ByteString -> Int" conversion while compiling under GHC 6.5 -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

1 0