Re: [Haskell-cafe] Hoogle: converting binary .hoo into text?

On Mon, May 11, 2009 at 10:25 AM, Neil Mitchell
Hi Peter,
I would like to use the Hoogle text format in C#.
Out of curiosity, why? I'm just interested to know what work you're doing.
Sure. We're building with a graphical representation of a Haskellish language (a tiny subset of Haskell actually). The target audience is graphical artists and designers. For testing, I would like to populate the library with primitives taken from the Haskell base libraries. I tried using the GHC API for it, but got stuck. I got the advice in #haskell to parse the Hoogle format, which indeed looks simple enough for the task.
Hoogle on Hackage comes with a bunch of binary *.hoo files. Can these be
converted to text/xml? If not, is the binary format documented?
The binary format is documented in the code, and there is a show command. Try:
hoogle +base --dump
However the binary format is not an encoding of the text format, it throws away lots of data, and precomputes interesting tables etc. If you want the original, the binary is probably not that useful.
I do have a complete set of text files though. I can upload them to the Hoogle website, or I can distribute them with the hackage package. I could just email them to you privately. What seems the best option for everyone?
I'm not everyone but I guess it would be useful in general. From within Haskell, ideally one would just use the GHC API (or Cabal API) to extract all information I guess, but for usage in other languages, an easy to parse format is better no? (maybe even XML, but that is bulky :-)
I know I can build hoo files using "cabal haddock --hoogle". But doing this
on the BASE package (which I need) from Hackage fails (I'm on Windows, using MSYS): configure: creating ./config.status config.status: error: cannot find input file: include/HsBaseConfig.h.in Does anyone have an easy solution? Maybe I just need to switch to Linux to get this working? :-)
I have a small pile of hacks to get the base library building with Hoogle. You are welcome to look at them (data/generate in the Hoogle repo). Caution: these hacks may make your eyes bleed, and certainly won't work for anything but the GHC/base version pair that I last did it on.
Thanks
Neil

Hi
Sure. We're building with a graphical representation of a Haskellish language (a tiny subset of Haskell actually). The target audience is graphical artists and designers. For testing, I would like to populate the library with primitives taken from the Haskell base libraries. I tried using the GHC API for it, but got stuck. I got the advice in #haskell to parse the Hoogle format, which indeed looks simple enough for the task.
You might be able to use haskell-src-exts (plus a little bit of preprocessing) to parse the declarations. I deliberately tried to follow Haskell syntax where possible.
Hoogle on Hackage comes with a bunch of binary *.hoo files. Can these be converted to text/xml? If not, is the binary format documented?
The binary format is documented in the code, and there is a show command. Try:
hoogle +base --dump
However the binary format is not an encoding of the text format, it throws away lots of data, and precomputes interesting tables etc. If you want the original, the binary is probably not that useful.
I do have a complete set of text files though. I can upload them to the Hoogle website, or I can distribute them with the hackage package. I could just email them to you privately. What seems the best option for everyone?
I'm not everyone but I guess it would be useful in general. From within Haskell, ideally one would just use the GHC API (or Cabal API) to extract all information I guess, but for usage in other languages, an easy to parse format is better no? (maybe even XML, but that is bulky :-)
Writing a converting from text files to XML is fine by me - Hoogle already has the textual format parser, so if you add a patching adding a dump XML option I'll happily apply. I'll send you the .txt files by private email. It seems like you want them now, but don't care about keeping them up to date, since it's only a demo. Hence fast and quick, but not long term provided, seems a good short-term compromise. Thanks Neil
participants (2)
-
Neil Mitchell
-
Peter Verswyvelen