
Hello! I've finally came up with some motivation for a project to get my feet wet using Haskell, and for this little pet project I need an interface to Xapian. After reading various documents on FFI in general, I've got a brief working implementation, and I'm now looking for how to better structure the public API. First, a quick bit of background if you're not familiar with Xapian. Xapian is a search engine, and provides a C++ API. You store documents in a database (handled by Xapian), and index documents by adding terms to them. Xapian provides stemming algorithms to help generate these terms from other data. Xapian also has an interface to queries (through a Xapian::Enquire object), and also a query parser to allow for natural language queries to be parsed and ran. For more information, you can check out the API at [1] - it's fairly small. As Xapian is C++, it seems my best option is to create my own simple C wrapper, which also lets me tailor my FFI to be easy to use from Haskell. You can see my C api on Github [2] - for now it's very stripped down; I've been wrapping stuff on a need-to-use basis. * * * Currently what I have is functional (in the sense that it works), but it's extremely tied to I/O and very little of the code is pure. For example, to create and index a document, you need to do something along the lines of: do document <- newDocument setDocumentData document "Document data" addPosting document "search_term" 1 addDocument database document (Assuming you already have an open database handle). How horrible imperative this all looks! :-) A document *feels* like it should be quite pure, however retrieving properties of a document performs I/O. For example, I'd like to have something like: data Document = Document { data :: String, postings :: [String] } do document <- getDocument database 123 -- Get doc #123 and have `document` refer to a pure Document object. I'm still stuck in the IO monad a bit, but at least I can write pure functions to operate on `Document` values now. The problem I see with this, is that I believe I'd have to retrieve all parts of document in my `getDocument` function (include the data and all postings), and I can't benefit from being lazy here.
From what I gather, all the methods on Xapian documents are lazy (such as getting the document data, and getting terms associated with documents), which would mean that my foreign imports would have to be `IO String`, for example. This tends to fairly quick cause the IO monad to propogate everywhere.
* * * I think that's enough information to explain my current progress, and my concerns. It could well be that I'm overly worrying about everything being in the IO monad, but as I said - Haskell is new to me. All of my work is at [3], and I'd love any advice you have. Haddock documents have been exported to ocharles.org.uk [4]. Thanks for your time, Oliver Charles / ocharles -- [1]: http://xapian.org/docs/apidoc/html/annotated.html [2]: https://github.com/ocharles/Xapian-Haskell/blob/master/c/cxapian.h [3]: https://github.com/ocharles/Xapian-Haskell [4]: http://ocharles.org.uk/tmp/search-xapian/Search-Xapian.html