Please review my Xapian foreign function interface

Hello! I've finally came up with some motivation for a project to get my feet wet using Haskell, and for this little pet project I need an interface to Xapian. After reading various documents on FFI in general, I've got a brief working implementation, and I'm now looking for how to better structure the public API. First, a quick bit of background if you're not familiar with Xapian. Xapian is a search engine, and provides a C++ API. You store documents in a database (handled by Xapian), and index documents by adding terms to them. Xapian provides stemming algorithms to help generate these terms from other data. Xapian also has an interface to queries (through a Xapian::Enquire object), and also a query parser to allow for natural language queries to be parsed and ran. For more information, you can check out the API at [1] - it's fairly small. As Xapian is C++, it seems my best option is to create my own simple C wrapper, which also lets me tailor my FFI to be easy to use from Haskell. You can see my C api on Github [2] - for now it's very stripped down; I've been wrapping stuff on a need-to-use basis. * * * Currently what I have is functional (in the sense that it works), but it's extremely tied to I/O and very little of the code is pure. For example, to create and index a document, you need to do something along the lines of: do document <- newDocument setDocumentData document "Document data" addPosting document "search_term" 1 addDocument database document (Assuming you already have an open database handle). How horrible imperative this all looks! :-) A document *feels* like it should be quite pure, however retrieving properties of a document performs I/O. For example, I'd like to have something like: data Document = Document { data :: String, postings :: [String] } do document <- getDocument database 123 -- Get doc #123 and have `document` refer to a pure Document object. I'm still stuck in the IO monad a bit, but at least I can write pure functions to operate on `Document` values now. The problem I see with this, is that I believe I'd have to retrieve all parts of document in my `getDocument` function (include the data and all postings), and I can't benefit from being lazy here.
From what I gather, all the methods on Xapian documents are lazy (such as getting the document data, and getting terms associated with documents), which would mean that my foreign imports would have to be `IO String`, for example. This tends to fairly quick cause the IO monad to propogate everywhere.
* * * I think that's enough information to explain my current progress, and my concerns. It could well be that I'm overly worrying about everything being in the IO monad, but as I said - Haskell is new to me. All of my work is at [3], and I'd love any advice you have. Haddock documents have been exported to ocharles.org.uk [4]. Thanks for your time, Oliver Charles / ocharles -- [1]: http://xapian.org/docs/apidoc/html/annotated.html [2]: https://github.com/ocharles/Xapian-Haskell/blob/master/c/cxapian.h [3]: https://github.com/ocharles/Xapian-Haskell [4]: http://ocharles.org.uk/tmp/search-xapian/Search-Xapian.html

Thanks Oliver! I haven't had time to look at your bindings very closely, but I do have a few initial things to think about: * You're writing your imports by hand. Several other projects used to do this, and it's a pain in the neck when you have hundreds of functions that you need to bind and you don't quite do it all properly, and then you segfault because there was an API mismatch. Consider using a tool like c2hs which rules out this possibility (and reduces the code you need to write!) * I see a lot of unsafePerformIO and no consideration for: - Interruptibility - Thread safety People who use Haskell tend to expect their code to be thread-safe and interruptible, so we have high standards ;-) But even C++ code that looks thread safe may be mutating shared memory under the hood, so check carefully. I use Sup, so I deal with Xapian on a day-to-day basis. Bindings are good to see. Cheers, Edward
participants (2)
-
Edward Z. Yang
-
Oliver Charles