Using Get monad to efficiently parse byte-stuffed data

Hello all, I'm struggling to use the Get monad to efficiently parse the some binary data of the format below. I simply can't find a way to use the monad to parse it efficiently. Binary data is terminated by a 0xFF marker. If the data itself contains an 0xFF, a 0x00 is byte-stuffed after it to indicate it is part of the data. An 0xFF followed by a non-zero byte indicates that it is a real marker and terminates the data and the 0xFF is not part of the data. Going through the data using getWord8 is a no-go. It is just too slow. My solution so far has been to get the underlying bytestring and work with that, but obviously defeats the purpose of using the Get monad. What might be a better solution? Pom.

Hi, I wrote a type program to compute fibonacci series, if the max value is big, then it becomes very slow. like take 100 fib How can I make it faster :-)
>>>>>>>>>> fibo 0 = 0 fibo 1 = 1 fibo (n+2) = (fibo n) + (fibo (n+1)) fib :: [Int] fib = [fibo i | i <- [0..]]

On 24 March 2010 15:46, 国平张
I wrote a type program to compute fibonacci series, if the max value is big, then it becomes very slow.
What does this have to do with Pom's problem? http://tinyurl.com/ya3vvye -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

sorry. My mistake :-). I wanted to send to haskell-cafe, so I just
pick up a mail thread and send reply. But I forgot to change the
title.
Sorry again for it.
2010/3/24 Ivan Miljenovic
On 24 March 2010 15:46, 国平张
wrote: I wrote a type program to compute fibonacci series, if the max value is big, then it becomes very slow.
What does this have to do with Pom's problem?
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

sorry. My mistake :-). I wanted to send to haskell-cafe, so I just pick up a mail thread and send reply. But I forgot to change the title.
Don't do that! Your email contains headers like the following:
In-Reply-To:

On 24/03/10 04:36, Pom Monico wrote:
Hello all,
I'm struggling to use the Get monad to efficiently parse the some binary data of the format below. I simply can't find a way to use the monad to parse it efficiently.
Binary data is terminated by a 0xFF marker. If the data itself contains an 0xFF, a 0x00 is byte-stuffed after it to indicate it is part of the data. An 0xFF followed by a non-zero byte indicates that it is a real marker and terminates the data and the 0xFF is not part of the data.
So the only way to find out if you have reached the end of the data is to read beyond it? Yuck! I haven't used it myself, but I suggest taking a look at Attoparsec, which is like Parsec but for bytestrings. That might be able to handle this kind of non-determinism. Paul.

On 10-03-24 12:44 PM, Paul Johnson wrote:
On 24/03/10 04:36, Pom Monico wrote:
Hello all,
I'm struggling to use the Get monad to efficiently parse the some binary data of the format below. I simply can't find a way to use the monad to parse it efficiently.
Binary data is terminated by a 0xFF marker. If the data itself contains an 0xFF, a 0x00 is byte-stuffed after it to indicate it is part of the data. An 0xFF followed by a non-zero byte indicates that it is a real marker and terminates the data and the 0xFF is not part of the data. So the only way to find out if you have reached the end of the data is to read beyond it? Yuck!
I haven't used it myself, but I suggest taking a look at Attoparsec, which is like Parsec but for bytestrings. That might be able to handle this kind of non-determinism. Don't know who thought it up but I have to deal with it.
The best I've gotten so far is the code below. In the Get monad, I do a "lookAhead getRemainingLazyByteString", call "go s 0 []" to get the byte-unstuffed data and #bytes consumed, and then do a "skip consumed" to move forward in the Get monad. It is ugly and slow. I would like to implement this using the Get monad without dropping into the ByteString, and also so speed it up. In order of most time consuming, the sections are: go, takeToFF, ECS5-4, ECS5-2. But I'm also out of ideas without resorting to mutable constructs, esoteric tricks, or things that look like it belongs in a .c file. Pom. ff00 = {-# SCC "ECS5-8" #-} L.pack [0xff, 0x00] ff = {-# SCC "ECS5-9" #-} L.singleton 0xff takeToFF s = case L.elemIndex 0xff s of Nothing -> (s, L.empty, L.length s) -- Return entire string Just x -> let (h, t) = L.splitAt x s in (h, t, x) -- Split before 0xff go s n acc | L.null s = {-# SCC "ECS5-1" #-} (L.concat $ reverse acc, n) | L.isPrefixOf ff00 s = {-# SCC "ECS5-2" #-} go (L.drop 2 s) (n+2) (ff:acc) | L.isPrefixOf ff s = {-# SCC "ECS5-3" #-} (L.concat $ reverse acc, n) | otherwise = {-# SCC "ECS5-4" #-} let (h, t, x) = takeToFF s in go t (n+x) (h:acc)

Going through the data using getWord8 is a no-go. It is just too slow. My solution so far has been to get the underlying bytestring and work with that, but obviously defeats the purpose of using the Get monad. What might be a better solution?
hGetArray with IOUArrays goes perty fast.
participants (7)
-
Ivan Miljenovic
-
Ketil Malde
-
Paul Johnson
-
Pom
-
Pom Monico
-
Tim Attwood
-
国平张