
On 12 October 2005 17:34, Wilhelm B. Kloke wrote:
Simon Marlow
schrieb: Just doing 'make -k' in ghc/rts should leave all the .hc files behind, because you already have the -keep-hc-files flag in your command line, so GHC won't delete the intermediate .hc files even when the subsequent compilation stage fails.
This doesn't work. The commands don't leve a .hc file behind, even with make -k.
My apologies: the failure was happening before generation of the .hc file. The error is generated by this in Cmm.h: #if SIZEOF_mp_limb_t != SIZEOF_VOID_P #error mp_limb_t != StgWord: assumptions in PrimOps.cmm are now false #endif SIZEOF_mp_limb_t comes from DerivedConstants.h, and SIZEOF_VOID_P comes from ghcautoconf.h (both in ghc/includes). Both of these files should be from the target system for a cross-compile; I strongly suspect that one of them has been overwritten by the host version in your tree. Sorry that this process is somewhat flaky; we've never got around to making the build system really do cross-compilation properly, and the fact that it usually only needs to be done once for any given platform means there isn't a lot of motivation to do the work. Cheers, Simon

Hi all, I have a program that uses hash tables to store word counts. It can use few, large hash tables, or many small ones. The problem is that it uses an inordinate amount of time in the latter case, and profiling/-sstderr shows it is GC that is causing it (accounting for up to 99% of the time(!)) Is there any reason to expect this behavior? Heap profiling shows that each hash table seems to incur a memory overhead of approx 5K, but apart from that, I'm not able to find any leaks or unexpected space consumption. Suggestions? -k -- If I haven't seen further, it is by standing in the footprints of giants

On Friday 14 Oct 2005 3:17 pm, Ketil Malde wrote:
Hi all,
I have a program that uses hash tables to store word counts. It can use few, large hash tables, or many small ones. The problem is that it uses an inordinate amount of time in the latter case, and profiling/-sstderr shows it is GC that is causing it (accounting for up to 99% of the time(!))
Is there any reason to expect this behavior?
Heap profiling shows that each hash table seems to incur a memory overhead of approx 5K, but apart from that, I'm not able to find any leaks or unexpected space consumption.
Suggestions?
Well you could use a StringMap.. http://homepages.nildram.co.uk/~ahey/HLibs/Data.StringMap/ But that lib a bit lightweight so probably doesn't provide everyting you need at the moment. But it's something I mean to get back to when I have some time, so if there's anything in particular you want let me know and I'll give it some priority. You certainly should not need anything like 5k overhead per map, and you don't have to work via the IO monad either (though you can use an MVar StringMap or something if you like). Also, I seem to remember some thread about some problem with Data.HashTable implementation and space behaviour. Unfortunately I can't remember what the problem was and don't know if it's been fixed :-( Regards -- Adrian Hey

On Oct 14, 2005, at 10:17 AM, Ketil Malde wrote:
Hi all,
I have a program that uses hash tables to store word counts. It can use few, large hash tables, or many small ones. The problem is that it uses an inordinate amount of time in the latter case, and profiling/-sstderr shows it is GC that is causing it (accounting for up to 99% of the time(!))
Is there any reason to expect this behavior?
Heap profiling shows that each hash table seems to incur a memory overhead of approx 5K, but apart from that, I'm not able to find any leaks or unexpected space consumption.
That "5K" number made me immediately suspicious, so I took a look at the source code to Data.HashTable. Sure enough, it's allocating a number of large IOArrays, which are filled with pointers. The practical upshot is that, for a hash table with (say) 24 entries, the GC must scan an additional 1000 pointers and discover that each one is []. I've seen other implementations of this two-level technique which use a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for less-than-gigantic hash tables. This might be worth doing in the Data.HashTable implementation. [Curious: what (if anything) is being used to test Data.HashTable? I'd be willing to undertake very small amounts of fiddling if I could be sure I wasn't slowing down something which mattered.] -Jan-Willem Maessen
Suggestions?
-k -- If I haven't seen further, it is by standing in the footprints of giants
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Jan-Willem Maessen
The practical upshot is that, for a hash table with (say) 24 entries, the GC must scan an additional 1000 pointers and discover that each one is [].
Would a smaller default size help? In my case, I just wanted HTs for very sparse tables.
[Curious: what (if anything) is being used to test Data.HashTable? I'd be willing to undertake very small amounts of fiddling if I could be sure I wasn't slowing down something which mattered.]
I'd be happy to test it (or provide my test code). My program isn't too beautiful at the moment, but is tunable to distribute the word counts over an arbitrary number of hash tables. BTW, could one cheat by introducing a write barrier manually in some way? Perhaps by (unsafe?) thaw'ing and freeze'ing the arrays when they are modified? -k -- If I haven't seen further, it is by standing in the footprints of giants

Simon Marlow
SIZEOF_mp_limb_t comes from DerivedConstants.h, and SIZEOF_VOID_P comes from ghcautoconf.h (both in ghc/includes). Both of these files should be from the target system for a cross-compile; I strongly suspect that one of them has been overwritten by the host version in your tree.
Those files got overwritten several times for me, too, despite following the instructions... I ended up watching for them to get overwritten and copying them back whenever that happened. I've been trying to crosscompile for amd64-freebsd from Mac OS X, but although I seem to get all the hc files, the ghc-pkg-inplace crashes, and so does ghc-inplace, with the following backtrace: #0 0x00000000014f3ed0 in StgRun () #1 0x00000000014f09b5 in schedule () #2 0x00000000014f1386 in waitThread_ () #3 0x00000000014f12aa in scheduleWaitThread () #4 0x00000000014ee421 in rts_evalLazyIO () #5 0x00000000014edccf in main () Should I try to build again with debug symbols, or is that pointless for ghc output? Thanks, John

John Hornkvist
Simon Marlow
writes: SIZEOF_mp_limb_t comes from DerivedConstants.h, and SIZEOF_VOID_P comes from ghcautoconf.h (both in ghc/includes). Both of these files should be from the target system for a cross-compile; I strongly suspect that one of them has been overwritten by the host version in your tree.
Those files got overwritten several times for me, too, despite following the instructions... I ended up watching for them to get overwritten and copying them back whenever that happened.
This is not really sufficient. I use "chflags uchg" to protect these files. At least you you will be noticed, when the overwrite tries to happen.
I've been trying to crosscompile for amd64-freebsd from Mac OS X, but although I seem to get all the hc files, the ghc-pkg-inplace crashes, and
Are you sure? The recommended procedure has a serious bug, which I discovered about 30 minutes ago. You need to do "make boot" in the rebuilding of ghc/lib/compat with the same flags as "make all", because libghccompat.a is built in "make boot" and you won't get the .hc files otherwise. Just look into ghc/lib/Compat subdirectories for .hc files.
so does ghc-inplace, with the following backtrace:
#0 0x00000000014f3ed0 in StgRun () #1 0x00000000014f09b5 in schedule () #2 0x00000000014f1386 in waitThread_ () #3 0x00000000014f12aa in scheduleWaitThread () #4 0x00000000014ee421 in rts_evalLazyIO () #5 0x00000000014edccf in main ()
Should I try to build again with debug symbols, or is that pointless for ghc output?
This is pointless, and typical for the sort of errors SM mentioned. I have got dozens of these in the process. Now let me report real progress. I have got it on FreeBSD-6.0-amd64 at last. Here are the steps on the host system, which are needed IIRC: cp ../../fptools-amd64/ghc-6.4.1/ghc/includes/{ghcautoconf.h,DerivedConstants.h,GHCConstants.h} ghc/includes touch ghc/includes/{ghcautoconf.h,DerivedConstants.h,GHCConstants.h,mkDerivedConstants.c} touch ghc/includes/{mkDerivedConstantsHdr,mkDerivedConstants.o,mkGHCConstants,mkGHCConstants.o} touch ghc/includes/{ghcautoconf.h,DerivedConstants.h,GHCConstants.h} chflags uchg ghc/includes/{ghcautoconf.h,DerivedConstants.h,GHCConstants.h} (cd glafp-utils && gmake boot && gmake) (cd ghc && gmake boot && gmake) (cd libraries && gmake boot && gmake) (cd ghc/compiler && gmake boot stage=2 && gmake stage=2) (cd ghc/lib/compat && gmake clean; rm .depend; gmake boot UseStage1=YES EXTRA_HC_OPTS='-O -fvia-C -keep-hc-files'; gmake -k UseStage1=YES EXTRA_HC_OPTS='-O -fvia-C -keep-hc-files') (cd ghc/rts && gmake -k UseStage1=YES EXTRA_HC_OPTS='-O -fvia-C -keep-hc-files') (cd ghc/utils && gmake clean; gmake -k UseStage1=YES EXTRA_HC_OPTS='-O -fvia-C -keep-hc-files') gmake hc-file-bundle Project=Ghc Don't forget to delete Linker.c (for ghci). The stage on teh host system where the process fails jsut now is
$MAKE -C libraries boot all because Fake happy is not happy!
But ghc-inplace seems to work pretty good now on amd64. -- Dipl.-Math. Wilhelm Bernhard Kloke Institut fuer Arbeitsphysiologie an der Universitaet Dortmund Ardeystrasse 67, D-44139 Dortmund, Tel. 0231-1084-257
participants (6)
-
Adrian Hey
-
Jan-Willem Maessen
-
John Hornkvist
-
Ketil Malde
-
Simon Marlow
-
Wilhelm B. Kloke