[Hackage] #779: Create tarball of all Hoogle input files

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- For many packages on Hackage there is a corresponding Hoogle input file, for example: http://hackage.haskell.org/packages/archive/cmdargs/0.6.5/doc/html/cmdargs.t... To generate Hoogle databases for every package on Hackage I have a script that downloads the documentation for every package, then puts it in a tarball and uploads it back to http://haskell.org/hoogle/hackage- haddock.tar.gz There are three disadvantages to the current process. It's quite a lot of connections to Hackage to download every single file, it takes quite a lot of time to download them individually (about 50 minutes using wget serially), and the tarball on Hoogle is often (always?) out of date. Would it be possible for Hackage to supply a tarball of all the Hoogle input files? -- Neil Mitchell (Hoogle author) -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- Comment(by duncan): I think it probably makes more sense to generate hoogle databases of each package and upload them individually. Then we can periodically generate a unified hoogle database covering some subset of packages, either using a specialised client or on the server. Most packages don't change most of the time, so a client that's doing this regularly should not need to download most packages' hoogle input files / database. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:1 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- Comment(by guest): To be clear, the process that generates Haddock documentation already generates and uploads documentation for each package. I'm only suggesting the files already on the server should be put in a unified tarball. How would a client avoid the need to download all hoogle input files? How would it figure out which ones have been modified? If there was a query to build a file from a particular date or later that would work, but it seems more effort than just providing everything. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:2 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- Comment(by duncan): Replying to [comment:2 guest]:
How would a client avoid the need to download all hoogle input files? How would it figure out which ones have been modified? If there was a query to build a file from a particular date or later that would work, but it seems more effort than just providing everything.
Once you've downloaded each file once and it is cached locally, then checking for updates is realtively cheap using standard HTTP techniques (e.g. last-changed or ETag). -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:3 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- Comment(by guest): Really? Don't I have to make 100's of separate HTTP requests, one for each file I'm interested in? I found that the time to download things of Hackage is proportional to the number of files being queried, not the size of them. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:4 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- Comment(by duncan): HTTP allows sending multiple queries in one connection without having to wait for all the replies in between, so it should be pretty effecient. Using a separate `wget` call for each file would not achieve that of course. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:5 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: new Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | --------------------------------+------------------------------------------- Comment(by ross): Does [http://hackage.haskell.org/packages/archive/00-hoogle.tar.gz] meet your needs? It will be updated whenever the documentation is. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:6 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: closed Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Resolution: fixed Keywords: | Difficulty: unknown Ghcversion: | Platform: --------------------------------+------------------------------------------- Changes (by guest): * status: new => closed * resolution: => fixed Comment: Duncan: I should learn some more about HTTP... Ross: Perfect! That's fantastic. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:7 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects

#779: Create tarball of all Hoogle input files --------------------------------+------------------------------------------- Reporter: guest | Owner: Type: enhancement | Status: closed Priority: normal | Milestone: HackageDB Component: hackageDB website | Version: HEAD Severity: normal | Resolution: fixed Keywords: | Difficulty: unknown Ghcversion: | Platform: --------------------------------+------------------------------------------- Comment(by elga): * [http://www.releve-identite-operateur.fr/numbero-de-virgin-mobile.html virgin mobile] -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/779#comment:8 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects
participants (1)
-
Hackage