
Hi everyone, as an exercise for learning Haskell i'm writing a program that converts Ascii Stl files (a simple format for 3D model data) into binary Stl format. In my first attempt i used normal strings and the result was therefore very slow. Now i rewrote the program to use lazy bytestrings instead. But well... it got even slower, so i'm probably doing something terribly wrong ;) Here's what i do (the relevant parts): ... asciiFile <- L.readFile (args!!0) binHandle <- openBinaryFile (args!!1) WriteMode let asciiLines = L.split (c2w '\n') asciiFile ... parseFile binHandle (Normal, tail asciiLines) -- First line contains a comment ... where L is Data.ByteString.Lazy. readFile ought to be lazy so it should not read the whole file into ram at this point. But when i split the lines and pass them to a function, is this still carried out lazily? parseFile processes a line, depending on the StlLineType and then calls itself recursive like this: parseFile :: Handle -> (StlLineType, [L.ByteString]) -> IO () ... parseFile h (Vertex1, s) = do let vals = extractVertex (head s) L.hPutStr h $ runPut (writeFloatArray vals) parseFile h (Vertex2, tail s) ... extractVertex looks like this: extractVertex :: L.ByteString -> [Float] extractVertex s = let fracs = filter (\n -> L.length n > 0) $ L.split (c2w ' ') s in [read (C.unpack(fracs!!1)) :: Float, read (C.unpack(fracs!!2)) :: Float, read (C.unpack(fracs!!3)) :: Float] where C is Data.ByteString.Lazy.Char8. It splits a byte string, filters out the whitespaces and converts certain entries to floats. Maybe unpack is an expensive operation. Is there a better way to convert a Bytestring to float? I know, this is bad Haskell code ;) But where is my grand, obvious misuse of Bytestring? I'm grateful for any suggestion to improve that code. I'm using ghc, version 6.12.1. Thank you, Peter

On Wednesday 26 January 2011 01:52:33, Peter Braun wrote:
Hi everyone,
as an exercise for learning Haskell i'm writing a program that converts Ascii Stl files (a simple format for 3D model data) into binary Stl format. In my first attempt i used normal strings and the result was therefore very slow. Now i rewrote the program to use lazy bytestrings instead.
But well... it got even slower, so i'm probably doing something terribly wrong ;)
Here's what i do (the relevant parts):
... asciiFile <- L.readFile (args!!0) binHandle <- openBinaryFile (args!!1) WriteMode let asciiLines = L.split (c2w '\n') asciiFile ... parseFile binHandle (Normal, tail asciiLines) -- First line contains a comment ...
where L is Data.ByteString.Lazy. readFile ought to be lazy so it should not read the whole file into ram at this point. But when i split the lines and pass them to a function, is this still carried out lazily?
Yes, readFile reads a chunk and only proceeds to read the next when it is required. I'm not sure how lazy split is exactly, it could stop at the first newline or it could split the entire chunk in one go, but that wouldn't make much difference either way.
parseFile processes a line, depending on the StlLineType and then calls itself recursive like this:
parseFile :: Handle -> (StlLineType, [L.ByteString]) -> IO ()
Shouldn't the type better be parseFile :: Handle -> StlLineType -> [L.ByteString] -> IO () ?
... parseFile h (Vertex1, s) = do let vals = extractVertex (head s) L.hPutStr h $ runPut (writeFloatArray vals) parseFile h (Vertex2, tail s)
pattern match, please parseFile h (Vertex1, []) = return () -- or what you have to do at the end parseFile h (Vertex1, (l:ls)) = do let vals = extractVertex l L.hPutStr $ runPut (writeFloatArray vals) parseFile h (Vertex2, ls)
...
extractVertex looks like this:
extractVertex :: L.ByteString -> [Float] extractVertex s = let fracs = filter (\n -> L.length n > 0) $ L.split (c2w ' ') s in [read (C.unpack(fracs!!1)) :: Float,
Ouch, if you're unpacking everything, what's the point of using ByteStrings? And splitting ByteStrings is sort of expensive too. Okay, trouble is, there's no obvious way to parse a Float from a ByteString, but bytestring-lexing provides parsing Doubles, you could use that and convert the Doubles to Floats with GHC.Float.double2Float (or, if you have optimisations turned on, with realToFrac, which should then be rewritten to double2Float). That should be much faster than unpacking and using read (particularly since the Read instances of Float and Double are slow).
read (C.unpack(fracs!!2))
:: Float,
read (C.unpack(fracs!!3))
:: Float]
Instead of list-indexing with (!!), pattern matching gives nicer code.
where C is Data.ByteString.Lazy.Char8. It splits a byte string, filters out the whitespaces and converts certain entries to floats. Maybe unpack is an expensive operation. Is there a better way to convert a Bytestring to float?
You could also try using attoparsec and write a real parser for your file, that should be pretty snappy. attoparsec also provides double :: Parser Double (no direct parsing of Float provided), you could then again call double2Float on the result.
I know, this is bad Haskell code ;) But where is my grand, obvious misuse of Bytestring?
Lots of splitting into small pieces and lots of unpacking. Both add up to considerable cost. I suspect also read to take a substantial amount of the time, but you also have that for String IO.
I'm grateful for any suggestion to improve that code. I'm using ghc, version 6.12.1.
Thank you, Peter

Thanks for your reply. Attoparsec seems to be the perfect tool for the job and i will ultimately try to implement my converter utilizing it. But i also installed bytestring-lexing and will try this out, just to see if i can gain some performance when eliminating the suboptimal reads / unpacks. Thanks for the tip! Shouldn't the type better be parseFile :: Handle -> StlLineType -> [L.ByteString] -> IO () Yes, there really is no reason for using a tuple. I'll also eliminate the usage of (!!) and function calls where pattern matching is possible.
participants (2)
-
Daniel Fischer
-
Peter Braun