
On 10-03-24 12:44 PM, Paul Johnson wrote:
On 24/03/10 04:36, Pom Monico wrote:
Hello all,
I'm struggling to use the Get monad to efficiently parse the some binary data of the format below. I simply can't find a way to use the monad to parse it efficiently.
Binary data is terminated by a 0xFF marker. If the data itself contains an 0xFF, a 0x00 is byte-stuffed after it to indicate it is part of the data. An 0xFF followed by a non-zero byte indicates that it is a real marker and terminates the data and the 0xFF is not part of the data. So the only way to find out if you have reached the end of the data is to read beyond it? Yuck!
I haven't used it myself, but I suggest taking a look at Attoparsec, which is like Parsec but for bytestrings. That might be able to handle this kind of non-determinism. Don't know who thought it up but I have to deal with it.
The best I've gotten so far is the code below. In the Get monad, I do a "lookAhead getRemainingLazyByteString", call "go s 0 []" to get the byte-unstuffed data and #bytes consumed, and then do a "skip consumed" to move forward in the Get monad. It is ugly and slow. I would like to implement this using the Get monad without dropping into the ByteString, and also so speed it up. In order of most time consuming, the sections are: go, takeToFF, ECS5-4, ECS5-2. But I'm also out of ideas without resorting to mutable constructs, esoteric tricks, or things that look like it belongs in a .c file. Pom. ff00 = {-# SCC "ECS5-8" #-} L.pack [0xff, 0x00] ff = {-# SCC "ECS5-9" #-} L.singleton 0xff takeToFF s = case L.elemIndex 0xff s of Nothing -> (s, L.empty, L.length s) -- Return entire string Just x -> let (h, t) = L.splitAt x s in (h, t, x) -- Split before 0xff go s n acc | L.null s = {-# SCC "ECS5-1" #-} (L.concat $ reverse acc, n) | L.isPrefixOf ff00 s = {-# SCC "ECS5-2" #-} go (L.drop 2 s) (n+2) (ff:acc) | L.isPrefixOf ff s = {-# SCC "ECS5-3" #-} (L.concat $ reverse acc, n) | otherwise = {-# SCC "ECS5-4" #-} let (h, t, x) = takeToFF s in go t (n+x) (h:acc)