I would like to present a benchmark of Protein Databank parsers that indicates that one written in Haskell seems to outpace all others when using 4 or more of parallel cores:
hPDB - Haskell library for processing atomic biomolecular structures in
Protein Data Bank format -- Michal Jan Gajda
BMC Research Notes.2013, 6:483.
DOI: 10.1186/1756-0500-6-483
URL: http://www.biomedcentral.com/1756-0500/6/483
Please let me know if you know of any other parsers that could be added to this benchmark.
Along with hTalos, and parseSTAR parser libraries for nuclear magnetic resonance data it adds to growing collection of bioinformatic libraries written in Haskell.
Together with CloudHaskell and modern 48-core machines, they allow to process multigigabyte bioinformatic databases in a matter of few minutes (slightly over 8 minutes in case of over 10GB of PDB.)
--
Best regards
Michał J. Gajda