Slow documentation generation on Hackage

Hello everyone. Last night I uploaded my first Hackage library with documentation (StrictBench). I learned that it takes somewhere between 2 and 8 hours for the link to the documentation to become active. This is confusing for first-time package authors (I went to #haskell to ask what I had forgotten) and annoying for everyone (Don Stewart submitted it to reddit shorly after I published it, which kind of wastes the time of everyone who checks it out before the documentation is up). Moreover, there seems to be no reason for it; on your own computer haddock takes just (milli)seconds. Hence I wanted to ask if this is a bug or if there is a good technical or social reason for it, and whether there is any way around it. Regards, Remco Niemeijer

On Jun 8, 2009, at 04:10 , Niemeijer, R.A. wrote:
Hence I wanted to ask if this is a bug or if there is a good technical or social reason for it, and whether there is any way around it.
Auto-running haddock on upload strikes me as a good way to open hackage.haskell.org to a denial of service attack. Additionally, I *think* haddock is run as part of the automated build tests, which (again) happen on a regular schedule instead of being triggered by uploads to avoid potential denial of service attacks. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Jun 8, 2009, at 04:36 , Brandon S. Allbery KF8NH wrote:
On Jun 8, 2009, at 04:10 , Niemeijer, R.A. wrote:
Hence I wanted to ask if this is a bug or if there is a good technical or social reason for it, and whether there is any way around it.
Auto-running haddock on upload strikes me as a good way to open hackage.haskell.org to a denial of service attack.
I should clarify: yes, in a valid project haddock takes almost no time. Nevertheless: (1) if many uploads of even valid packages are made in a very short time, the system load could well be severely impacted; (2) what of malicious packages, which might trigger bugs in haddock leading to (say) 100% CPU loops? That we don't know of any doesn't mean there aren't any, unless the test suite is absolutely 100% complete (and for a large program, that becomes as hard to verify as the program itself. now consider that haddock is part of ghc these days...). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

If that is the main concern, would the following not work? - Hackage accounts already have to be created manually, so there is no chance of a DDoS. - Uploading to hackage requires a username and password, which means the user can be identified. Set a timeout on uploads for each user: packages sent within 2 minutes of the previous one are automatically refused. Prevents quantity-based DoS. - Generate haddock docs immediately on upload, but apply a 2-second timeout; if it takes longer, the process is killed and no documentation is generated. Prevents exploit-based DoS. - If many valid packages are uploaded in a short time (though I have my doubts as to how often that is going to happen), put them in a queue. Documentation will take a bit longer to generate, but the server can control the load. Prevents inadvertent DoS. Result: immediate documentation for every contributor with good intentions (which, face it, is going to be all of them; I doubt Haskell is popular enough yet to be the target of DoS attacks) and no possibility for DoS attacks. I might be overlooking something, but I believe this should work just fine. -----Original Message----- From: Brandon S. Allbery KF8NH [mailto:allbery@ece.cmu.edu] Sent: maandag 8 juni 2009 10:41 To: Brandon S. Allbery KF8NH Cc: Niemeijer, R.A.; haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Slow documentation generation on Hackage On Jun 8, 2009, at 04:36 , Brandon S. Allbery KF8NH wrote:
On Jun 8, 2009, at 04:10 , Niemeijer, R.A. wrote:
Hence I wanted to ask if this is a bug or if there is a good technical or social reason for it, and whether there is any way around it.
Auto-running haddock on upload strikes me as a good way to open hackage.haskell.org to a denial of service attack.
I should clarify: yes, in a valid project haddock takes almost no time. Nevertheless: (1) if many uploads of even valid packages are made in a very short time, the system load could well be severely impacted; (2) what of malicious packages, which might trigger bugs in haddock leading to (say) 100% CPU loops? That we don't know of any doesn't mean there aren't any, unless the test suite is absolutely 100% complete (and for a large program, that becomes as hard to verify as the program itself. now consider that haddock is part of ghc these days...). -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

"Niemeijer, R.A."
If that is the main concern, would the following not work?
[...]
Result: immediate documentation for every contributor with good intentions
Or simply, on upload, generate the doc directory with a temporary page saying that documentation will arrive when it's good and ready? Result: as before, but with less confusion for new contributors. So not as good as Niemeijer's suggestions, but probably easier to implement. -k -- If I haven't seen further, it is by standing in the footprints of giants

On Mon, 2009-06-08 at 11:24 +0200, Ketil Malde wrote:
"Niemeijer, R.A."
writes: If that is the main concern, would the following not work?
[...]
Result: immediate documentation for every contributor with good intentions
Having the server generate docs itself would be regressing towards a worse design. The server should just manage upload/download, storage and management of information. It should not be running builds and generating docs.
Or simply, on upload, generate the doc directory with a temporary page saying that documentation will arrive when it's good and ready?
And use a design where there isn't just a single build client, like the design for the new hackage-server. Any authorised client should be able to upload docs. That should include the package maintainer as well as authorised build bots. Then we can easily adjust the time between package upload and documentation generation without having to tell the server anything. Duncan

On Mon, Jun 8, 2009 at 11:05, Niemeijer, R.A.
which, face it, is going to be all of them; I doubt Haskell is popular enough yet to be the target of DoS attacks
Second that. I think this is a good case in which some security should be traded in for usability. And even if a DoS attack occurs, it just causes some downtime... not unlike the certain hours of downtime that the documentation currently has. If there's actually an exploitable leak in Haddock that allows the server to be compromised (not just DoSed), then the current Haddock generation is just as vulnerable. But of course I'm not the maintainers of Hackage... it's up to them to decide. Thomas

Thomas ten Cate wrote:
Niemeijer, R.A. wrote:
which, face it, is going to be all of them; I doubt Haskell is popular enough yet to be the target of DoS attacks
Second that. I think this is a good case in which some security should be traded in for usability.
Those who would trade security for usability deserve neither usability nor security ;) Seriously, all the Haskell hackers I've encountered have been good people, but Haskell is the language of the hair shirt afterall. Security is hard enough to come by in the first place, sacrificing what little we have is not the right path. The Haskell interwebs are already too susceptible to downtimes from non-malicious sources, and it floods #haskell whenever it happens. I agree that the turnaround time is a bit long, but I think server stability is more important than instant feedback. It's easy enough to get an account on community.haskell.org and just upload your own docs there[1]. The thing I'd be more interested in getting quick feedback on is whether compilation succeeds in the Hackage environment, which is very different from my own build environment. Given the various constraints mentioned, and depending on the load averages for the servers, perhaps the simplest thing to do would be to just reduce the latency somewhat. Flushing the queue every 4~6 hrs seems both long enough to circumvent the major DoS problems, and short enough to be helpful to developers. Especially if the queue could be set up to be fair among users (e.g. giving each user some fixed number of slots or cycles per flush, delaying the rest until the next cycle). A different approach would be to do exponential slowdown per user. So if a user has submitted N jobs in the last Window (e.g. 24 hours), then the job is run around Epsilon^N after submission (where Epsilon is, say, 3 minutes). [1] I do: http://community.haskell.org/~wren/ The scripts to automate it are trivial, but I can share them if people like. -- Live well, ~wren

On Mon, Jun 08, 2009 at 04:36:14AM -0400, Brandon S. Allbery KF8NH wrote:
Additionally, I *think* haddock is run as part of the automated build tests, which (again) happen on a regular schedule instead of being triggered by uploads to avoid potential denial of service attacks.
That's correct. One workaround is to upload your package just before 0:00, 6:00, 12:00 or 18:00 UK time.
participants (7)
-
Brandon S. Allbery KF8NH
-
Duncan Coutts
-
Ketil Malde
-
Niemeijer, R.A.
-
Ross Paterson
-
Thomas ten Cate
-
wren ng thornton