
Hello! I'm pleased to announce the second major release of the biostockholm library! This library allows you to parse and render files in the Stockholm 1.0 format, which is used by Pfam, Rfam, Infernal and others for holding information about families of proteins or non-coding RNAs. http://hackage.haskell.org/package/biostockholm Despite this low increase in number from 0.1 to 0.2, this is actually a big rewrite of the library. Now we have: - An streaming interface similar to what SAX parsers provide. This allows you to consume Stockholm files using constant memory (80k in a simple case). - More test cases. It's able to consume its own pretty printed version of Rfam through the document interface, and is also capable of reading the full Rfam stockholm file (which has some huge families) through the streaming interface. - QuickCheck properties. Now we have three different QuickCheck properties covering almost everything. These have helped uncover some tricky bugs that were never found before. However, two of these three properties still don't pass, but I consider the failing examples that I've investigated just corner cases. Unfortunately, Stockholm lacks a formal specification. - Conduit interface. Besides a lazy I/O version, now there's a conduit interface. - Code much easier to read and reason about. - Fast enough: the streaming interface achieves 12 MiB/s for parsing, which is pretty nice considering that there are some known overheads on its implementation. For the tasks that biostockholm 0.1 already handled, biostockholm 0.2 tends to be slightly slower. However, biostockholm 0.2 is able to handle some previously impossible cases where an streaming solution is required. Cheers! -- Felipe.

On Thu, Jan 26, 2012 at 11:42 PM, Felipe Almeida Lessa
- Fast enough: the streaming interface achieves 12 MiB/s for parsing, which is pretty nice considering that there are some known overheads on its implementation.
I've just released biostockholm 0.2.1 which uses conduit 0.2. Now the streaming interface achieves 31 MiB/s when parsing Rfam 9.1's full data on my computer, which is a 2.6x increase in performance! Kudos to Michael Snoyman who squashed the biggest "known overhead" that I've mentioned above on this new conduit 0.2 release. Cheers! =) -- Felipe.
participants (1)
-
Felipe Almeida Lessa