hackage-server memory requirements

Hello, What are the typical memory requirements to run hackage-server (as a mirror of the current hackage). The reason I ask is that I tried mirroring hackage on my 4GB desktop machine. I let it mirror overnight but when I came back in the morning the machine was completely unresponsive. It turned out hackage-server was consuming all my RAM and the mirroring was just half-way done. Does this sound like a memory leak or does hackage-server actually require more than 4GB memory? Regards, Bas

On 25 October 2011 20:23, Bas van Dijk
Does this sound like a memory leak or does hackage-server actually require more than 4GB memory?
Wow! The SoC student reported memory usage of about ~700MB with a full import from the Hackage of the time: http://cogracenotes.wordpress.com/page/2/ Of course, it is possible that Cabal has grown enough (or the server has become inefficient enough) that it now requires 4GB memory. Do you still see 4GB usage if you restart the server? If it is a leak caused by the mirroring process the memory usage might be a lot less once you restart. Another thing to do would be to heap profile the server.. it would be very interesting to know what is going on. I'd like to try do this out myself, but I can't mirror Hackage without hitting the usage limits on my internet connection :-). Max

On 25 October 2011 22:53, Max Bolingbroke
Do you still see 4GB usage if you restart the server?
No, after restarting it was less but, if I remember correctly, still over 700MB.
Another thing to do would be to heap profile the server.. it would be very interesting to know what is going on.
I could run a sync on my 4GB laptop in the background tomorrow and do a heap profile. I do expect to kill it half way since I also have to get other stuff done. What kind[1] of heap profile would be most useful. (In my experience -hy or -hd are the most informative).
I'd like to try do this out myself, but I can't mirror Hackage without hitting the usage limits on my internet connection :-).
Speaking of which, how large is Hackage? Regards, Bas [1] http://www.haskell.org/ghc/docs/latest/html/users_guide/prof-heap.html#rts-o...

On 25 October 2011 23:18, Bas van Dijk
What kind[1] of heap profile would be most useful. (In my experience -hy or -hd are the most informative).
Either of those sound reasonable.
I'd like to try do this out myself, but I can't mirror Hackage without hitting the usage limits on my internet connection :-).
Speaking of which, how large is Hackage?
Good question! A check shows that the sum of the sizes of the .tar.gz files gives 1269M. However, this neglects the documentation, build logs etc. The total size of the "archive" directory is 13GB, though this would be much less if we were to gzip the documentation first. Actually this is not as much as I thought, so I might be able to try out a local mirror after all. Max

On 26 October 2011 00:18, Bas van Dijk
I could run a sync on my 4GB laptop in the background tomorrow and do a heap profile. I do expect to kill it half way since I also have to get other stuff done.
I did a quick heap profile using -hy: http://code.haskell.org/~basvandijk/hackage-server.hp View it using hp2any-manager (it seems hp2ps only shows one part of the profile). I killed hackage-mirror and hackage-server when the resident memory of hackage-server was over 1GB. At that time it was uploading packages beginning with 'b' (so there were still a lot to come). According to the profile most space is used by ARR_WORDS (which is the internal name for a ByteArray# if I remember correctly). Bas

On 26 October 2011 13:46, Bas van Dijk
According to the profile most space is used by ARR_WORDS (which is the internal name for a ByteArray# if I remember correctly).
Interesting. There are a lot of ByteStrings in use in the server, so candidates for a leak might be: 1. The cached cabal file in the package information 2. A StringTable such as the one within a TarIndex 3. The cached index.tar.gz 4. Perhaps the mirroring feature is not strict enough in the ByteString it accepts Regarding the fourth possibility, it looks to me like there is a possibility that the lazy ByteString returned from Unpack.unpackPackageRaw (and stored in the pkgData field of PkgInfo) is not forced. I'm not sure, but depending on how the Tar package is implemented this seems like it might cause the garbage collector to hold on to the whole decompressed contents of the tarball in memory, rather than just the decompressed Cabal file that we want. So this is almost pure speculation, but perhaps adding (BS.length pkgStr `seq`) just before the liftIO on Mirror.hs:122 would reduce the memory usage significantly. Worth a try? Max

On 26 October 2011 19:36, Max Bolingbroke
So this is almost pure speculation, but perhaps adding (BS.length pkgStr `seq`) just before the liftIO on Mirror.hs:122 would reduce the memory usage significantly. Worth a try?
I made the change but after syncing for while (I continued where I left of earlier) the resident memory quickly reached 1GB again. I killed it when it was uploading packages beginning with a 'c'. After the server was restarted and initialized the memory was just 700MB. Bas
participants (2)
-
Bas van Dijk
-
Max Bolingbroke