
Hi all, I have a project which currently uses Cabal, and I would like to switch to using a plain Makefile. I have two examples of projects that use Makefiles, darcs and jhc, but they both appear to hand-code the list of dependencies for executables. The "-M" option to ghc should let us do this automatically, but either because of a deficiency in GHC or in GNU Make, that looks to be impossible or difficult. Does anyone have experience with this? Since I'm not using the multi-compiler features of Cabal, I think it shouldn't be too hard to get the project-specific part of my Makefile down to the size of the .cabal file. Many thanks, Frederik P.S. I want to use 'make' rather than Cabal because: - I want to be able to rebuild specific targets, rather than building everything every time - I want better dependency inference, for instance I don't want to relink every executable every time I do a build (I think this is fixed in newer versions of ghc, but not the one I use) - I ultimately want to be able to use a more sophisticated build system than Cabal's. GNU Make is a good first start, because it allows me to run several jobs simultaneously, for instance potentially utilizing multiple processors. In the future, I want to write a build system that does result caching as well. There are a lot of tricks that can be done in build systems, so in general I think it would be good to have a build system which is separable from the domain-specific configuration and compiler logic, rather than coupling them as Cabal and "ghc --make" do. The compile / fix compiler errors cycle is an important part of the development process for me, and so I want recompilation to go as quickly as possible. By the way, the GHC user manual claims that using --make is much faster than using a Makefile, but in a test on a small program, the difference was not significant (16 seconds vs. 18 seconds). Furthermore, some build systems are able to combine targets, e.g. running "ghc -c Bar.hs -c Foo.hs" instead of "ghc -c Bar.hs; ghc -c Foo.hs", which should eliminate the already small difference. -- http://ofb.net/~frederik/

On Mon, Apr 17, 2006 at 10:57:32PM +0100, Frederik Eaton wrote:
I have two examples of projects that use Makefiles, darcs and jhc, but they both appear to hand-code the list of dependencies for executables. The "-M" option to ghc should let us do this automatically, but either because of a deficiency in GHC or in GNU Make, that looks to be impossible or difficult. Does anyone have experience with this?
FWIW ginsu and DrIFT both use make and don't use ghcs --make feature. I find this can be faster once your projects grow beyond a certain size as it takes ghc a while to figure out which files need to be rebuilt with --make, especially when a preprocessor is used. The main reason I have found to use --make is it makes concurrent profiling and non-profiling builds a lot easier. http://repetae.net/john/computer/ginsu/ http://repetae.net/john/computer/haskell/DrIFT/ both require an explicit 'make depend' whenever you change the import list. it would be nice if ghc could spit out the dependencies to a file as a side effect whenever using '--make' since it collects that info anyway. ginsu is particularly tricky in order to get hierarchical modules to play nice with 'automake' and 'autoconf'. it has to create symbolic links from internal files to top level ones. If better solutions exist, I'd love to hear of them.
The compile / fix compiler errors cycle is an important part of the development process for me, and so I want recompilation to go as quickly as possible. By the way, the GHC user manual claims that using --make is much faster than using a Makefile, but in a test on a small program, the difference was not significant (16 seconds vs. 18 seconds). Furthermore, some build systems are able to combine targets, e.g. running "ghc -c Bar.hs -c Foo.hs" instead of "ghc -c Bar.hs; ghc -c Foo.hs", which should eliminate the already small difference.
indeed, for a project the size of jhc (~150 modules) 'make' is quite signifigantly faster than '--make'. John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Tuesday, April 18, 2006, 3:25:35 AM, you wrote:
FWIW ginsu and DrIFT both use make and don't use ghcs --make feature. I find this can be faster once your projects grow beyond a certain size as it takes ghc a while to figure out which files need to be rebuilt with --make
if that is due to the time of reading .hi files, my alternative Binary library should help in some future -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello John,
Tuesday, April 18, 2006, 3:25:35 AM, you wrote:
FWIW ginsu and DrIFT both use make and don't use ghcs --make feature. I find this can be faster once your projects grow beyond a certain size as it takes ghc a while to figure out which files need to be rebuilt with --make
if that is due to the time of reading .hi files, my alternative Binary library should help in some future
I'd be suprised if you could improve on GHC's binary library. Using BinMem (reading/writing directly to memory), GHC's binary library is about as fast as it gets. I'm sure yours wins when dealing with files, though. Cheers, Simon

Hello Simon, Tuesday, April 18, 2006, 3:02:20 PM, you wrote:
if that is due to the time of reading .hi files, my alternative Binary library should help in some future
I'd be suprised if you could improve on GHC's binary library. Using BinMem (reading/writing directly to memory), GHC's binary library is about as fast as it gets. I'm sure yours wins when dealing with files, though.
sorry, Simon, but when i found 10x difference in speed (6mb/s vs 60 mb/s) it was on membufs :) although i can't say that the difference anyway will be 10 times. for so fast lib the time required to traverse lists is essential. there are also a lot of other possible problems. i just asking - whether the time required for reading these (or any other binary) files is essential for compilation speed? (although i don't tested code in ghc 6.5, may be it is faster) if you are interested, i can comment some bits of Binary module. i optimized essentially the same code, so i know it's potential bottlenecks :) getWord8 (BinMem _ ix_r sz_r arr_r) = do ix <- readFastMutInt ix_r sz <- readFastMutInt sz_r when (ix >= sz) $ ioError (mkIOError eofErrorType "Data.Binary.getWord8" Nothing Nothing) arr <- readIORef arr_r w <- unsafeRead arr ix writeFastMutInt ix_r (ix+1) return w here you use boxed reference for storing array. i uses unboxed Ptr reference, that is of course faster. also, using ioError here means that the whole function result can't be "unboxed". at least i seen substantial slowdown when i added error processing here. of course, it's hard to omit. also the following may be faster: writeFastMutInt ix_r (ix+1) unsafeRead arr ix i also use one less variable here (only curptr and bufend) instance Binary Word16 where get h = do w1 <- getWord8 h w2 <- getWord8 h return $! ((fromIntegral w1 `shiftL` 8) .|. fromIntegral w2) first possible problem here is what getWord8 is not inlined. second - using of checked arithmetic (operations on Ints may have additional checks, unlike operations on Int#). third - it's better to make all operations on Int and only then pack all data to Int16 constructor btw, my lib use 32/64-bit independent binary files (i.e. you can write the file with 64-bit program and then read it with the 32-bit one). to achieve this, i use variable-size representation for Int and Word. i also use compiler-independent representation for Integer (the same as dor Int). may be it will be interesting for ghc and will allow to use both (32 bit and 64 bit) compilers on the same project? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Simon,
Tuesday, April 18, 2006, 3:02:20 PM, you wrote:
if that is due to the time of reading .hi files, my alternative Binary library should help in some future
I'd be suprised if you could improve on GHC's binary library. Using BinMem (reading/writing directly to memory), GHC's binary library is about as fast as it gets. I'm sure yours wins when dealing with files, though.
sorry, Simon, but when i found 10x difference in speed (6mb/s vs 60 mb/s) it was on membufs :) although i can't say that the difference anyway will be 10 times. for so fast lib the time required to traverse lists is essential. there are also a lot of other possible problems. i just asking - whether the time required for reading these (or any other binary) files is essential for compilation speed?
It's been on my todo list for a while to benchmark the various Binary libraries, since there's a consensus that we need some kind of Binary functionality in Haskell'. I said I'd be surprised if GHC's could be improved on. And indeed, I am now surprised :-) You do point out some places where it could be improved. The first thing I would do is replace the IOUArray with a ForeignPtr now, since that lets you unbox the Ptr without losing garbage collection of the memory, and retains the ability to re-allocate the storage. How does your library handle memory allocation, do you have to explicitly free the memory used for the buffer? To answer your question, reading interface files isn't a bottleneck in GHC (although improving its speed is definitely worthwhile).
instance Binary Word16 where get h = do w1 <- getWord8 h w2 <- getWord8 h return $! ((fromIntegral w1 `shiftL` 8) .|. fromIntegral w2)
first possible problem here is what getWord8 is not inlined. second - using of checked arithmetic (operations on Ints may have additional checks, unlike operations on Int#)
yes, using uncheckedShiftL would be better.
third - it's better to make all operations on Int and only then pack all data to Int16 constructor
An Int16 is represented using an Int with sign-extension, so it should be the same. I agree that making the format independent of word size would also be good for a general purpose Binary library, although it wouldn't be helpful for GHC. Cheers, Simon

Hello Simon, Wednesday, April 19, 2006, 12:15:55 PM, you wrote:
It's been on my todo list for a while to benchmark the various Binary
my lib anyway will be fastest :)))
libraries, since there's a consensus that we need some kind of Binary functionality in Haskell'.
Streams library includes AltBinary functionality and also emulation of two versions of NewBinary library - byte-aligned and bit-aligned one. as long as you search for nhc-style serialization library (i.e., with `get` and `put_` functions) my lib should be the best beast around. to be exact, it was started as fast serialization library when it become obvious that Joel Reymont's program is bound by serialization speed. and when fast serialization routines was written, i added Streams functionality to make this lib more useful :) it will be great to add all these libs i written last months to GHC and then Haskell' - AltBinary, Streams, unboxed references and refreshed implementation of Arrays library (in particular, it supports resizable arrays, i.e. implements one of GHC's tickets) my problem is that i still don't documented most of these libs and what i'm not too enthusiastic about them's cabalizing and all other forms of maintenance. also, for Streams lib i implemented most of things that i know how to implement, but such things as I/O multiplexing and network support is beyond of my knowledge. btw, what you think about network-alt lib by Einar? is it better than ghc's bundled libs for networking? if so, i will cowork with him to add his lib to Streams framework
I said I'd be surprised if GHC's could be improved on. And indeed, I am now surprised :-) You do point out some places where it could be improved. The first thing I would do is replace the IOUArray with a ForeignPtr now,
this will be faster only for ghc 6.6
since that lets you unbox the Ptr without losing garbage collection of the memory, and retains the ability to re-allocate the storage. How does your library handle memory allocation, do you have to explicitly free the memory used for the buffer?
buffer handling is part of buffering stream transformer. yes, memory freed explicitly on vClose. the AltBinary lib (and NewBinary, which is emulated via AltBinary) just implements get/put_ for many types via vGetByte/vPutByte, so serialization part of library don't know anything about memory handling. btw, my lib implements even pure (de)serialization functions - encode::a->String and decode. they just use Streams working in ST monad - StringBuffer/StringReader
instance Binary Word16 where get h = do w1 <- getWord8 h w2 <- getWord8 h return $! ((fromIntegral w1 `shiftL` 8) .|. fromIntegral w2)
third - it's better to make all operations on Int and only then pack all data to Int16 constructor
An Int16 is represented using an Int with sign-extension, so it should be the same.
yes, Word8->Word16 and other conversions are handled by special rules for `fromIntegral`. i don't looked in GHC/Word.hs when i wrote previous letter btw, one more drawback of ghc's Binary is what on each getWord8 operation, type of handle should be tested (whether it is file or memory buffer). in my lib, when all operations are inlined, such test is not required -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Streams library includes AltBinary functionality and also emulation of two versions of NewBinary library - byte-aligned and bit-aligned one. as long as you search for nhc-style serialization library (i.e., with `get` and `put_` functions) my lib should be the best beast around. to be exact, it was started as fast serialization library when it become obvious that Joel Reymont's program is bound by serialization speed. and when fast serialization routines was written, i added Streams functionality to make this lib more useful :)
it will be great to add all these libs i written last months to GHC and then Haskell' - AltBinary, Streams, unboxed references and refreshed implementation of Arrays library (in particular, it supports resizable arrays, i.e. implements one of GHC's tickets)
Believe me I've looked in detail at your streams library. Performance-wise it is great but the design needs to be reworked IMO. The main problem is that it doesn't have enough type structure. There are many combinations of stream transformers that don't make sense, and should therefore be ruled out by the type system. There are operations that don't work on some streams. There should at the least be a type distinction between directly accessible memory streams, byte streams, and text streams. Additionally I would add separate classes for seekable and buffered streams. I believe these changes would improve performance by reducing the size of dictionaries. There are problems with memory management: as far as I can tell, the buffers are never freed if you just release a stream. You should be using ForeignPtrs instead of explicitly malloc'd buffers. Text encoding/decoding is inefficient. Not a design problem, of course, but having good text coding support is one of the main reasons for replacing the IO library. I have a sketched design if you'd like to see it sometime, but I have to extract it from the partially-written code. Cheers, Simon

Hello Simon, Wednesday, April 19, 2006, 4:45:19 PM, you wrote:
Believe me I've looked in detail at your streams library. Performance-wise it is great but the design needs to be reworked IMO.
The main problem is that it doesn't have enough type structure. There are many combinations of stream transformers that don't make sense, and should therefore be ruled out by the type system. There are operations that don't work on some streams. There should at the least be a type distinction between directly accessible memory streams, byte streams, and text streams. Additionally I would add separate classes for seekable and buffered streams. I believe these changes would improve performance by reducing the size of dictionaries.
you have written this in February, but this discussion was not finished due to my laziness. now i tried to split Stream interface to several parts. so 1) that you think - Stream should be base for all other stream classes or each Stream class should be independent? i.e. class (Stream m h) => InByteStream m h where vGetByte :: h -> m Word8 or class InByteStream m h where vGetByte :: h -> m Word8 ? 2) separation of Stream classes make some automatic definitions impossible. for example, released version contains vGetBuf implementation that is defined via vGetChar and works ok for streams that provide only vGetChar as base function. i tried to implement this via instances (TextStream => BlockStream in this case) but this immediately leads to the "incoherent instances" problem and so can't be really used. now, when i implemented (in my internal version) your suggestion about splitting TextStream, BlockStream and ByteStream classes, i just repeat such definitions across each stream that needs them. not very good, but seems that it the required sacrifice well, i can explain it better. in released version there are definitions: class Stream ... where ... vGetBuf h buf n = {- repeat vGetChar operation -} instance Stream StringBuffer where vGetChar = .... -- vGetBuf defined automatically instance Stream StringReader where vGetChar = .... -- vGetBuf defined automatically now, i should use the following: instance TextStream StringBuffer where vGetChar = .... instance BlockStream StringBuffer where vGetBuf h buf n = {- repeat vGetChar operation -} instance TextStream StringReader where vGetChar = .... instance BlockStream StringReader where vGetBuf h buf n = {- repeat vGetChar operation -} as you see, the same vGetBuf implementation are repeated for each Stream that have vGetChar as it's base operation. the following makes compiler not very happy: instance TextStream m h => BlockStream m h where vGetBuf h buf n = {- repeat vGetChar operation -} 3) the problems are substantially growed now - when i tried to separate input and output streams (the same will apply to detaching of seekable streams into the separate class). the problem is what i need either to provide 2 or 3 separate implementations for buffering of read-only, write-only and read-write streams or have some universal definition that should work even when base Stream don't provide part of operations. the last seems to be impossible - may be i don't understand enough Haskell's class system? let's see: data BufferedStream h = Buf h .... vClose (Buf h ...) = vPutBuf ... - flush buffer's contents how i can implement this if `h` may not support vPutBuf operation? especially to allow read/write streams to work??? instance InBlockStream m h => SomeClass m (BufferedStream h) where ... instance OutBlockStream m h => SomeClass m (BufferedStream h) where ... GHC will be unhappy if some `h` supports both InBlockStream and OutBlockStream interfaces (if you don't understand the problem, i will write more. i even not sure that i can formulate this problem clearly!) 4) what you mean by "There are many combinations of stream transformers that don't make sense" ? splitting Stream class to the BlockStream/TextStream/ByteStream or something else? 5) why you think we need separate class for buffered streams? the classes defines INTERFACE of stream, i.e. supported operations, not it's internal implementation
There are problems with memory management: as far as I can tell, the buffers are never freed if you just release a stream. You should be using ForeignPtrs instead of explicitly malloc'd buffers.
i can use ForeignPtr just to "hold" buffer and Ptr inside speed-critical code (and propose to do the same in your Binary module). i.e. something like this: data Buffer = Buf !ForeignPtr !FastMutPtr !FastMutPtr -- buffer, current ptr, end of buffer
Text encoding/decoding is inefficient. Not a design problem, of course, but having good text coding support is one of the main reasons for replacing the IO library.
i will write separate letter about this
I have a sketched design if you'd like to see it sometime, but I have to extract it from the partially-written code.
i can just read this code to save your time. are you mean "new i/o" library? btw, i still don't understand purpose of following code: -- ----------------------------------------------------------------------------- -- Connecting streams -- | An input stream created by 'streamOutputToInput' data StreamInputStream = forall s . OutputStream s => StreamInputStream s -- | Takes an output stream, and returns an input stream that will yield -- all the data that is written to the output stream. streamOutputToInput :: (OutputStream s) => s -> IO StreamInputStream streamOutputToInput = error "unimplemented: streamOutputToInput" -- | Takes an input stream and an output stream, and pipes all the -- data from the former into the latter. streamConnect :: (InputStream i, OutputStream o) => i -> o -> IO () streamConnect = error "unimplemented: streamInputToOutput" can you say how it should work for the library user? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Tue, Apr 18, 2006 at 10:26:18AM +0400, Bulat Ziganshin wrote:
FWIW ginsu and DrIFT both use make and don't use ghcs --make feature. I find this can be faster once your projects grow beyond a certain size as it takes ghc a while to figure out which files need to be rebuilt with --make
if that is due to the time of reading .hi files, my alternative Binary library should help in some future
Interesting, A big bottleneck in jhc right now is reading the (quite large) binary ho and hl files on startup. a few things I have wanted out of a binary library are: * the ability to create a hash of the structure of the underlying data type, to verify you are reading data in the right format. * extensible type-indexed sets (implemented hackily in Info.Binary in jhc) * being able to jump over unneeded data, as in go directly to the 112th record, or the third field in a data structure without having to slurp through everything that came before it. * VSDB[1] style ACID updates as an option. * VSDB style write-time optimized constant hash table. I don't mind spending extra time when writing library files to speed up their usage. * mmap based reading. I was going to get around to writing this sometime, but perhaps there is room for a collaborative project in there. Is your code available somewhere bulat? John [1] VSDB is my very simple database that ensures full ACID semantics using just the file guarentees of unix, including the weaker guarentees of NFS. http://repetae.net/john/computer/vsdb/ Sort of like STM on the filesystem. -- John Meacham - ⑆repetae.net⑆john⑈

GHC might well be able to make use of such stuff too. In general, one would like to be able to treat a file much like a database, as you suggest, with binary serialisation of data structures into it. GHC's serialisation also includes a simple communing-up mechanism for "leaves", especially strings. We build a kind of dictionary, to avoid repeatedly re-serialising the same string. I guess that any good binary serialisation will want to do something similar. (Or something more dynamic, a la arithmetic coding.) Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of John Meacham | Sent: 19 April 2006 00:28 | To: glasgow-haskell-users@haskell.org | Subject: Re: using ghc with make | | On Tue, Apr 18, 2006 at 10:26:18AM +0400, Bulat Ziganshin wrote: | > > FWIW ginsu and DrIFT both use make and don't use ghcs --make feature. I | > > find this can be faster once your projects grow beyond a certain size as | > > it takes ghc a while to figure out which files need to be rebuilt with | > > --make | > | > if that is due to the time of reading .hi files, my alternative Binary | > library should help in some future | | Interesting, A big bottleneck in jhc right now is reading the (quite | large) binary ho and hl files on startup. a few things I have wanted out | of a binary library are: | | * the ability to create a hash of the structure of the underlying data | type, to verify you are reading data in the right format. | * extensible type-indexed sets (implemented hackily in Info.Binary in | jhc) | * being able to jump over unneeded data, as in go directly to the 112th | record, or the third field in a data structure without having to | slurp through everything that came before it. | * VSDB[1] style ACID updates as an option. | * VSDB style write-time optimized constant hash table. I don't mind | spending extra time when writing library files to speed up their | usage. | * mmap based reading. | | I was going to get around to writing this sometime, but perhaps there is | room for a collaborative project in there. Is your code available | somewhere bulat? | | John | | [1] VSDB is my very simple database that ensures full ACID semantics using | just the file guarentees of unix, including the weaker guarentees of | NFS. http://repetae.net/john/computer/vsdb/ | Sort of like STM on the filesystem. | | -- | John Meacham - ⑆repetae.net⑆john⑈ | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hello Simon, Wednesday, April 19, 2006, 7:28:23 PM, you wrote:
GHC might well be able to make use of such stuff too. In general, one would like to be able to treat a file much like a database, as you suggest, with binary serialisation of data structures into it.
what you mean by "database"? what operations you need, in addition to sequential read and write?
GHC's serialisation also includes a simple communing-up mechanism for "leaves", especially strings. We build a kind of dictionary, to avoid repeatedly re-serialising the same string. I guess that any good binary serialisation will want to do something similar. (Or something more dynamic, a la arithmetic coding.)
arithmetic coding in Haskell? :) it will be MUCH faster to use simplest form of serialization and then call C compression library such as ziplib i just scanned ghc's Binary library and can say what features i don't implemented in my lib: 1) lazyGet/lazyPut. it's no problem to copy your implementation but i still don't understand how lazyGet should work - it share the same buffer pointer as one used in `get`. so `get` and consuming structure returned by lazyGet should interfere 2) i don't think that dictionary sharing should be part of general Binary library. but i tried to implement my lib so that this can be implemented in user code. it seems that i failed and i think that it is Haskell's drawback :) let's see: we want to use dictionary in get/put_ functions for FastString, so that large datastructure that includes strings can be serialized with just `put`. but `put` have the following signature: class Binary a where put :: OutByteStream h => h -> a -> IO () where OutByteStream defined as class OutByteStream m h where vPutByte :: h -> Word8 -> m () so, `put` only has access to OutByteStream's functions (i.e. only vPutByte) and can't deal with any data specific to user-supported stream, including it's dictionary. well, we can redefine Binary: class OutByteStream m h => Binary m h a where put :: h -> a -> m () instance Binary IO StreamWithDict FastString where put = ... -- now `put` can use functions specific for StreamWithDict but there is again catch: instance (Binary m h a) => Binary m h [a] where put h = replicateM_ (put h) here. internal call to `put` again will receive only OutByteStream dictionary! instance for FastString will just not matched!!! btw, btw. Haskell type classes has many non-obvious peculiarities. for example, it was not easy for to understand that Haskell resolve all overloading at compile time but finds what overloaded function to call at runtime (well, i can't even describe this behavior). can you recommend me paper to read about using Haskell class system? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
1) lazyGet/lazyPut. it's no problem to copy your implementation but i still don't understand how lazyGet should work - it share the same buffer pointer as one used in `get`. so `get` and consuming structure returned by lazyGet should interfere
lazyGet can only be used to read something that was written with lazyPut. lazyPut writes the offset of the end of the serialised data at the beginning, so that lazyGet can skip over it, and subsequent gets start from the next item in the stream.
btw, btw. Haskell type classes has many non-obvious peculiarities. for example, it was not easy for to understand that Haskell resolve all overloading at compile time but finds what overloaded function to call at runtime (well, i can't even describe this behavior). can you recommend me paper to read about using Haskell class system?
I recommend compiling a few programs and investigating the Core with -ddump-ds and -ddump-simpl to find out what GHC really does. Overloading isn't all "resolved at compile time", most of the time dictionaries of functions are passed around representing class predicates. Cheers, Simon

Hello Simon, Thursday, April 20, 2006, 11:54:59 AM, you wrote:
Bulat Ziganshin wrote:
1) lazyGet/lazyPut. it's no problem to copy your implementation but i still don't understand how lazyGet should work - it share the same buffer pointer as one used in `get`. so `get` and consuming structure returned by lazyGet should interfere
lazyGet can only be used to read something that was written with lazyPut. lazyPut writes the offset of the end of the serialised data at the beginning, so that lazyGet can skip over it, and subsequent gets start from the next item in the stream.
the problem is what there is ONLY ONE read pointer, so it should be impossible to intersperse reading with `get` and consuming structure returned by `lazyGet`, either for BinMem or BinIO ... hmm, on the other side they don't interspersed because there is only one call to the unsafeInterleaveIO. closures returned by lazyGet just should be evaluated STRICTLY after all other `get` operation. and changing the `getAt` implementation to the following: getAt bh p = do p0 <- tellBin bh seekBin bh p a <- get bh seekBin bh p0 return a should omit even this restriction
can you recommend me paper to read about using Haskell class system?
well, how about this?
Overloading isn't all "resolved at compile time", most of the time dictionaries of functions are passed around representing class predicates.
i'm tried to say that there is no such dynamic beast as virtual functions in C++ -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
i'm tried to say that there is no such dynamic beast as virtual functions in C++
I think you can use existential types to simulate virtual functions: -- 'a' is a stream of b's class StreamClass a b where get :: a -> IO b -- hide the particular 'a' to get any stream of b's data Stream b = forall a. StreamClass a b => Stream a -- a concrete 'a' data TextStream = ... instance StreamClass TextStream Char where get ... -- another concrete 'a' instance StreamClass MemoryStream Char where get ... -- concrete 'a' is hidden just as in C++ virtual dispatch useStream :: Stream b -> IO b useStream (Stream x) = get x -- because x satisfies StreamClass x b Regards, Brian.

Bulat Ziganshin wrote:
Thursday, April 20, 2006, 11:54:59 AM, you wrote:
lazyGet can only be used to read something that was written with lazyPut. lazyPut writes the offset of the end of the serialised data at the beginning, so that lazyGet can skip over it, and subsequent gets start from the next item in the stream.
the problem is what there is ONLY ONE read pointer, so it should be impossible to intersperse reading with `get` and consuming structure returned by `lazyGet`, either for BinMem or BinIO
... hmm, on the other side they don't interspersed because there is only one call to the unsafeInterleaveIO. closures returned by lazyGet just should be evaluated STRICTLY after all other `get` operation. and changing the `getAt` implementation to the following:
getAt bh p = do p0 <- tellBin bh seekBin bh p a <- get bh seekBin bh p0 return a
should omit even this restriction
Ah yes, I forgot about this subtle restriction. The underlying principle is that an individual 'get' or 'lazyGet' is atomic: it traverses the whole stream before returning a result. There may be 'lazyGet' components in the result, but each one of those is atomic in the same sense. So the file pointer never needs to be set to two values simultaneously. It's a bit hacky, but it works nicely.
can you recommend me paper to read about using Haskell class system?
well, how about this?
http://www.haskell.org/haskellwiki/Research_papers/Type_systems#Type_classes There are several papers at the level you're interested in, I think: Lennart's "Implementing Haskell Overloading" is a good one, and you might find "Type classes in Haskell" (Hall/Hammond/Peyton Jones/Wadler) useful, also "Implementing Type Classes" (Peterson/Jones). Cheers, Simon

Hello Simon, Friday, April 21, 2006, 12:22:00 PM, you wrote:
There are several papers at the level you're interested in, I think: Lennart's "Implementing Haskell Overloading" is a good one, and you might find "Type classes in Haskell" (Hall/Hammond/Peyton Jones/Wadler) useful, also "Implementing Type Classes" (Peterson/Jones).
to be exact, i'm intersting in papers about SPECIFYING class system what allows to implement this in this particular ways -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hello John, Wednesday, April 19, 2006, 3:27:49 AM, you wrote:
if that is due to the time of reading .hi files, my alternative Binary library should help in some future
Interesting, A big bottleneck
big bottleneck? ;)
in jhc right now is reading the (quite large) binary ho and hl files on startup. a few things I have wanted out of a binary library are:
I was going to get around to writing this sometime, but perhaps there is room for a collaborative project in there. Is your code available somewhere bulat?
http://freearc.narod.ru/Streams.tar.gz http://haskell.org/haskellwiki/Library/Streams but this doc don't contain info about Binary part that is now discussed. i attached to the letter my unfinished docs about this part of library now about your requirements:
* mmap based reading.
my Streams library mainly consists of two parts - Streams and AltBinary. The streams part implements Handle-like interface (including such functions as vGetChar, vGetByte, vPutBuf, vSeek and so on) for various data sources - files, memory buffers, pipes, strings. m/m files support is planned but now has just preliminary implementation AltBinary part works via the Streams part. basically, it just implements various ways to convert data structure to the sequence of vPutByte operations (and vice versa), with support for lists, arrays and all other "simpler" datatypes that Haskell/GHC provides to us. Binary instances for other datatypes can be autogenerated via DrIFT or TH
* being able to jump over unneeded data, as in go directly to the 112th record, or the third field in a data structure without having to slurp through everything that came before it.
what should be the user interface? the lib (its Streams part) supports vSeek/vTell operations. skipping to 112th record without knowing it's exact location will be impossible if each record can have different size the following things imho should not be a part of Binary library itself, but a higher-level client code
* the ability to create a hash of the structure of the underlying data type, to verify you are reading data in the right format.
you mean that using signature is not enough, or to be exact - that library should generate this signature itself? interesting. i think that for jhc (and potentially ghc) this should be implemented via DrIFT?
* extensible type-indexed sets (implemented hackily in Info.Binary in jhc)
by creating hash of structure we can reduce this task to just ordinary hash-like database?
* VSDB[1] style ACID updates as an option. * VSDB style write-time optimized constant hash table. I don't mind spending extra time when writing library files to speed up their usage.
i don't understand second thing. but anyway you already implemented VSDB database. you already has the way to autogenerate Binary instances. my lib can help by making serialization faster and providing uniform access to various media (files, buffers, m/m files). i can also work on hash-of-structure implementation using DrIFT or TH -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 20.04 12:06, Bulat Ziganshin wrote:
my Streams library mainly consists of two parts - Streams and AltBinary. The streams part implements Handle-like interface (including such functions as vGetChar, vGetByte, vPutBuf, vSeek and so on) for various data sources - files, memory buffers, pipes, strings. m/m files support is planned but now has just preliminary implementation
Having these as separate would be very nice. I think that a separately packaged AltBinary would be much easier to use for many people rather than force a dependency on the rest of Streams. - Einar Karttunen

Hi,
The compile / fix compiler errors cycle is an important part of the development process for me, and so I want recompilation to go as quickly as possible.
If you are not using any GHC specific features, then I find that using Hugs for compiler errors, then once Hugs is happy moving on to GHC can massively speed up development time. FWIW, I have been using this strategy for my work on Yhc (~150 modules) and get times of about 4 seconds for Hugs to load, parse, type check, and times of about 1 min for GHC --make to just do nothing, a complete build is about 15 mins. It makes fixing compile time errors much more fun. Thanks Neil

Hello Neil, Tuesday, April 18, 2006, 3:36:12 AM, you wrote:
If you are not using any GHC specific features, then I find that using Hugs for compiler errors, then once Hugs is happy moving on to GHC can massively speed up development time.
it's also my beloved path of work. compiler-specific parts can be separated by using preprocessor (hey, Neil when you will integrate this in WinHugs? :) ) o Frederik: now i use just shell script to preprocess hs files. it's also possible to use hugs instead or winhugs (hugs suupport preprocessing), but i'm GUIs fan :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Frederik Eaton wrote:
Hi all,
I have a project which currently uses Cabal, and I would like to switch to using a plain Makefile.
I use something like the attached Makefile. I don't have any experience with big projects though, so it could be that regenerating dependencies all the time is expensive. /Niklas ## This Makefile compiles all haskell sources files in current ## directory to an executable with the same name as the last level of ## the current directory path. Change the following variables to alter ## this behaviour EXEC = $(notdir $(shell pwd)) HSRCS = $(wildcard *.hs) HOBJS = $(addsuffix .o, $(basename $(HSRCS))) HCFLAGS = -O3 -fglasgow-exts HLFLAGS = ## Link rule $(EXEC): $(HOBJS) ghc $(HLFLAGS) -o $@ $^ ## Build rule %.o %.hi: %.hs ghc $(HCFLAGS) -c -o $(addsuffix .o, $(basename $@)) $< ## Make dependencies depend.mk: $(HSRCS) @echo Making dependencies ... ghc $(HCFLAGS) -M -optdep-f -optdepdepend.mk $^ -include depend.mk

Frederik Eaton wrote:
I have a project which currently uses Cabal, and I would like to switch to using a plain Makefile.
I have two examples of projects that use Makefiles, darcs and jhc, but they both appear to hand-code the list of dependencies for executables. The "-M" option to ghc should let us do this automatically, but either because of a deficiency in GHC or in GNU Make, that looks to be impossible or difficult. Does anyone have experience with this?
Yes, GHC's build system works like this. We don't use --make at all when building GHC itself. However, I want to switch to using Cabal to build the libraries. Eventually I envisage using Cabal to build more parts of the GHC tree. There's certainly no reason that you can't use ghc -M with GNU make, and that's something we will continue to support. What problems are you having?
Since I'm not using the multi-compiler features of Cabal, I think it shouldn't be too hard to get the project-specific part of my Makefile down to the size of the .cabal file.
Sure, but you also have a quite a lot of build system infrastructure to get right. Incedentally, this is exactly what Cabal was meant to avoid.
- I want to be able to rebuild specific targets, rather than building everything every time
Multiple Cabal packages tied together with a simple Makefile could do this, no?
- I want better dependency inference, for instance I don't want to relink every executable every time I do a build (I think this is fixed in newer versions of ghc, but not the one I use)
Yes, fixed in 6.4.2 The comments about --make are interesting: ceratinly we're aware that it has a scalability problem, and it certainly isn't conducive to a frequent edit/compile/test cycle, which is one reason we don't use it in GHC. However, for a straight single-CPU build, from scratch, for a large program, and if you have a lot of memory, it is much faster than make. I don't have figures for GHC to hand, but I believe it is on the order of a factor of 2 or 3. I would expect 'make -j4' to definitely win on a 4-core box. We do have experimental patches for GHC to make --make work across multiple CPUs too, but you don't get linear speedup (single-threaded GC is one bottleneck). I think it would be prudent at some point to make Cabal build without --make and to add multiprocessor support. FWIW, I don't think 'ghc -c Foo.hs Bar.hs' goes any faster than separately compiling the two files, it doesn't cache anything between the two. Cheers, Simon

Hi! I almost forgot that I never responded to this, sorry. On Tue, Apr 18, 2006 at 11:37:12AM +0100, Simon Marlow wrote:
Frederik Eaton wrote:
I have a project which currently uses Cabal, and I would like to switch to using a plain Makefile. I have two examples of projects that use Makefiles, darcs and jhc, but they both appear to hand-code the list of dependencies for executables. The "-M" option to ghc should let us do this automatically, but either because of a deficiency in GHC or in GNU Make, that looks to be impossible or difficult. Does anyone have experience with this?
Yes, GHC's build system works like this. We don't use --make at all when building GHC itself. However, I want to switch to using Cabal to build the libraries. Eventually I envisage using Cabal to build more parts of the GHC tree.
There's certainly no reason that you can't use ghc -M with GNU make, and that's something we will continue to support. What problems are you having?
OK, here: http://www.haskell.org/ghc/docs/6.4.2/html/users_guide/separate-compilation.... There is a section ---------------------------------------------------------------- SRCS = Main.lhs Foo.lhs Bar.lhs OBJS = Main.o Foo.o Bar.o ... cool_pgm : $(OBJS) rm -f $@ $(HC) -o $@ $(HC_OPTS) $(OBJS) ---------------------------------------------------------------- and below that a section: ---------------------------------------------------------------- # Inter-module dependencies Foo.o Foo.hc Foo.s : Baz.hi # Foo imports Baz Main.o Main.hc Main.s : Foo.hi Baz.hi # Main imports Foo and Baz ---------------------------------------------------------------- Then, the documentation explains how to get rid of the second section using 'ghc -M'. But IIRC it doesn't explain how to get rid of the first part, which includes information that 'ghc --make' doesn't require. Does that make sense? I may have it wrong/backwards. Also, it would be useful to know about dependencies on libraries. Basically, the best possible situation would be if 'ghc' gave me a list of everything whose {timestamp,checksum}+dependencies I should check to make sure that an output file is up to date (but not in a flattened form - i.e. for each file, only the *immediate* dependencies should be listed, and those can have their own dependencies, etc.). Even better would be to know which environment variables have been referenced by the compiler, and which directories in a search path mustn't have modules of a certain name (since those modules would have otherwise been used in preference to the ones which were actually used). Then it would be possible to just store every output file of 'ghc' in a cache in my home directory, and for instance switching between normal and profiling builds wouldn't require recompiling anything (or creating a separate project directory for each set of options) as it does in the current system. I don't think this is a very high priority, but I think that having 'ghc' duplicate more of 'make's functionality is kind of going in the wrong direction - I think compilers and build systems need to be able to evolve separately.
Since I'm not using the multi-compiler features of Cabal, I think it shouldn't be too hard to get the project-specific part of my Makefile down to the size of the .cabal file.
Sure, but you also have a quite a lot of build system infrastructure to get right. Incedentally, this is exactly what Cabal was meant to avoid.
- I want to be able to rebuild specific targets, rather than building everything every time
Multiple Cabal packages tied together with a simple Makefile could do this, no?
Yeah, but it would require me to put things in separate directories.
... I think it would be prudent at some point to make Cabal build without --make and to add multiprocessor support.
It's more than just multiprocessor issues I'm thinking about - for instance I might like to do a distributed compilation, or I might like to cache output in a special place as I suggested above. I think it would be very nice, someday, if 'Cabal' could provide dependency information, or for instance a list of which commands it's going to invoke to produce which files, so that the build system and compiler/packager are not tied together. Best regards, Frederik -- http://ofb.net/~frederik/
participants (9)
-
Brian Hulley
-
Bulat Ziganshin
-
Einar Karttunen
-
Frederik Eaton
-
John Meacham
-
Neil Mitchell
-
Niklas Sorensson
-
Simon Marlow
-
Simon Peyton-Jones