
My suggestion would be to look into writing a parser (via parsec) to handle this. Parsec is fairly easy to learn, and since your data is a pretty simple format, the parser won't be hard to write. Parsec will then give you a parser which you can run on the file, it'll catch parse errors, it's all around very lovely to use. There is a chapter of Real World Haskell on the subject, and I'm sure we'll be happy to help with whatever isn't covered. /Joe On Dec 7, 2009, at 10:43 PM, Brent Pedersen wrote:
hi, i have files with lots (millions of rows) of data like this: chr6 chr10 96.96 3392 101 2 79030508 79033899 4160024 4163413 0.0 5894 chr6 chr10 93.19 4098 228 13 117152751 117156826 11355389 11359457 0.0 5886 chr6 chr10 95.82 3445 130 5 112422073 112425513 7785396 7788830 0.0 5666
and i'd like to read it into a type like this:
data Blast = Blast { query :: S.ByteString , subject :: S.ByteString , hitlen :: Int , mismatch :: Int , gaps :: Int , qstart :: Int , qstop :: Int , sstart :: Int , sstop :: Int , pctid :: Double , evalue :: Double , bitscore :: Double } deriving (Show)
where each of those fields corresponds to a column in the file. in python, i do something like:
line = [fn(col) for fn, col in zip([str, str, int, int, int, int, int, int, int, float, float], sline.split("\t")]
what's a fast, simple way to do this in haskell?
is it something like:
instance Read Blast where readsPrec s = ?????
any pointers on where to look for simple examples of this type of parsing would be much appreciated. thanks, -brentp _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners