(moved to haskell-cafe)
Hello Jean-Philippe,
Tuesday, May 30, 2006, 2:58:01 PM, you wrote:
>> you should also see to the Collections package:
>> darcs get --partial http://darcs.haskell.org/packages/collections/
>> although it contains only _immutable_ datastructures at this
>> moment. may be, you can codevelop with Bernardy interface for mutable
>> maps/sets
> I'm not sure there is significant overlap between what we're doing.
> I'd be delighted to cooperate if you think this is possible though.
well, what i think: while all the actual work will be performed by
Caio himself, we can help him (and to ourselves as possible future
users of this lib) by describing infrastructure in that he need to
insert his work and pointing to "adjancent" projects
so, Caio, now i write for you, better organizing my previous post:
1. In terms of Haskell, Judy is a library of _mutable_ collections of
_unboxed_ elements. i pointed you to the Array wiki page, where
differences between boxed and unboxed, mutable and immutable
datastructures are described
2. Collections is a library of _immutable_ datastructures, and
Jean-Philippe especially omitted any support for mutable
datastructures - because it's the whole story of it's own. So, you
don't need to write your work as part of the Collections library
3. Nevertheless, Collections has GREAT organization and it will be
very helpful to study it. in particular, see at the Data.Collections
module, what describes various classes into which all collections are
fitted, and hierarchy between them
4. Next, when you will search for the datatypes/classes whose
interface you can imitate, you will find only Data.HashTable - other
datatypes in standard libraries are either immutable (say, Map) or
need contiguous indexes (say, MArray). So you will need to create your
own interface and here you can borrow from the Collections lib. Just
see at the differences between IArray and MArray classes. They are
very close, only MArray class update arrays on-the-place and has
additional 'Monad m' on all it's return types. say,
class IArray a e where
unsafeAt :: (Index i) => a i e -> i -> e
becomes
class MArray a e m where
unsafeRead :: (Index i) => a i e -> i -> m e
You can do the same with Collections classes. instead of reinventing
the wheel, you can add letter 'M' to it :)
say,
class Monoid c => Map c k a | c -> k a where
insert :: k -> a -> c -> c
delete :: k -> c -> c
member :: k -> c -> Bool
would become
class (Monoid c, Monad m) => MapM c k a | c -> k a where
insertM :: k -> a -> c -> m ()
deleteM :: k -> c -> m ()
memberM :: k -> c -> m Bool
5. as i already said, then you should use StablePtr and any form of
hashing/seriazliation in order to map between boxed (Haskell) and
unboxed (C) values. and then you can develop series of Diff
transformers what emulates immutable collections via mutable ones (say
interface of class Map via the interface of class MapM), DiffArray can
serve as template here
6. There is no class framework for container types other than this
Data.Collections module, so implementing it's close imitation (with
monadic operations) will help future users to easily learn and use
both interfaces
--
Best regards,
Bulat mailto:Bulat.Ziganshin@gmail.com
Hello Haskell-cafe,
i have analyzed performance of various sum-file implementations (see
http://shootout.alioth.debian.org/debian/benchmark.php?test=sumcol&lang=all )
first, one-liner implementation (attached as b.hs) works about 500
seconds on my cpu. it can be made 10x faster just by using it's own
'read' procedure (c.hs). this tells us that current 'read' implementation
in GHC is VERY SLOW.
next step to make things go faster you can see at
http://shootout.alioth.debian.org/debian/benchmark.php?test=sumcol&lang=ghc…
- now it's the fastest GHC entry on this test. speedup is a result of
throwing away all the lines/map/read/sum individual procedures and
writing entire algorithm over the plain stream of Chars. this allows
us to omit all the construction/deconstruction of lazy boxes, so this
program works about 10 seconds on my box.
now the main problem is getContents itself, which uses about 70% of
total time, because it requires to construct/deconstruct lazy box for
each Char read. Using Streams library, we can omit this work and use
straightforward imperative code (h.hs). this program works 4 times
faster (2.8 seconds) than faster Haskell variant, what should be about
1.5 times faster than today's fastest (D Digital Mars) entry in this
test
i think that close speed can be also obtained by using line-oriented
I/O in Streams+ByteString combination and then applying some
"readInt :: ByteString -> Int" conversion while compiling under GHC 6.5
--
Best regards,
Bulat mailto:Bulat.Ziganshin@gmail.com