RE: [Haskell-cafe] Haskell poker server

From: Cale Gibbard [mailto:cgibbard@gmail.com]
Well, here's an attempt at a start on a similar mechanism for Haskell:
---------- (start Packet.hs) module Packet where
import Data.Bits import Data.Word
class Packet a where readPacket :: [Word8] -> (a, [Word8])
... There's a request on LtU for a similar ability (somewhat wider in scope, perhaps): http://lambda-the-ultimate.org/node/view/938 There must be a "gap in the market" :-) Alistair. ----------------------------------------- ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

Erlang does this nicely, I replied to the LtU thread. I positively got the impression that nobody was parsing binary data in Haskell ;). On Aug 30, 2005, at 12:29 PM, Bayley, Alistair wrote:
There's a request on LtU for a similar ability (somewhat wider in scope, perhaps):
http://lambda-the-ultimate.org/node/view/938
There must be a "gap in the market" :-)

On Tue, Aug 30, 2005 at 12:41:20PM +0200, Joel Reymont wrote:
Erlang does this nicely, I replied to the LtU thread. I positively got the impression that nobody was parsing binary data in Haskell ;).
I am doing this quite often, I apologize for not sharing my experience and promise to improve ;-) BTW, if efficiency is not a primary concern, Parsec can be quite nice for decoding binary messages of many protocols. Best regards Tomasz

Joel Reymont wrote,
Can I beg for examples?
I've been using parsec for binary parsing (Java class files in my case) as a first exercise with both Haskell and combinator parsing, with a view to applying same to network protocols. The experience has been surprisingly pleasant. In particular, it handles the common "count, count*tokens" idiom that's found in many binary formats and network protocols very smoothly ... something I've always had to code by hand. Being so new to Haskell I'm hesitant to give possibly bad examples, but how about something like this, constantPoolEntry :: Parser ConstantPoolEntry constantPoolEntry = do { u1Literal 1 ; bytes <- u2 >>= (flip count $ u1) ; return (ConstantUTF8 (decodeUTF8 bytes)) } <|> -- etc. ... where u1Literal matches the byte '1' from the input, and u2 parses a following 2 byte unsigned integer byte-count which is then used to construct a parser of exactly byte-count bytes, ie. it matches the byte sequence, '1', n, b0 ... bn-1 I'm not aware of any other general purpose parsing framework which can do this anything like as deftly. Cheers, Miles

On 31/08/2005, at 7:37 AM, Miles Sabin wrote:
I've been using parsec for binary parsing (Java class files in my case) as a first exercise with both Haskell and combinator parsing, with a view to applying same to network protocols.
I've also been experimenting with using Parsec to parse binary files. The biggest problem with is that it's a slow, because you're working with types of [Char] rather than e.g. UArray Word8. This is usually fine if you're doing on-the-fly processing or are working with small files, but in my case, I was working with 1GB+ video files. Having over 1,000,000 list cells of one character each gets a tad slow :). I did a little bit of work (with emphasis on 'little') to start refactoring Parsec so it can work with generic sequences instead of just lists (so you can make it work with arrays), but haven't gotten too far. Having Parsec work speedily with binary files would absolutely rock -- I suspect there are a lot of people who've never thought about using parser combinators to process binary data, and if it's a feasible option ... -- % Andre Pang : trust.in.love.to.save http://www.algorithm.com.au/

On Tue, Aug 30, 2005 at 01:31:22PM +0200, Joel Reymont wrote:
Can I beg for examples?
This is from some old code, slightly polished for presentation - the code for parsing DNS domain name label in DNS packets: parseLabel :: CharParser st Label parseLabel = (> "label") $ do len <- byte guard (len <= 63) s <- count (fromIntegral len) anyChar return $! stringToLabel s Today I would rather process Word8 lists: type ByteParser a = GenParser Word8 st a parseLabel :: ByteParser st Label Here is a parser for the whole DNS message: parseMessage :: CharParser st Domain -> CharParser st Message parseMessage pDomain = do msgid <- parseMsgID header <- parseMsgHeader qdcount <- fmap fromIntegral beWord16 ancount <- fmap fromIntegral beWord16 nscount <- fmap fromIntegral beWord16 arcount <- fmap fromIntegral beWord16 questions <- count qdcount (parseQuestion pDomain) answers <- count ancount (parseRR pDomain) auth <- count nscount (parseRR pDomain) additional <- count arcount (parseRR pDomain) return (Message { msgID = msgid, msgHeader = header, msgQuestions = questions, msgAnswers = answers, msgAuth = auth, msgAdditional = additional }) The pDomain parameter is for dealing with DNS domain suffix compression - parsing a domain name may require jumping to an earlier part of the message. Today I would either use a MonadReader to hide this parameter, or a different parser monad with random access. In another application for reading some binary files I defined a BinaryParser monad, with one implementation using Parsec and another using unboxed arrays. IIRC, the implementation using UArrays was about 30-60 times faster than the one using parsec, probably because Parsec uses lists. Surprisingly, the biggest speed boost was caused (again IIRC) by writing a specialised "times" implementation for the UArray version. class (Functor m, Monad m) => BinaryParser m where byte :: m Word8 bytes :: Int -> m (UArray Int Word8) bytes n = do l <- count n byte return $! (listArray (0, n-1) l) int8 :: m Int8 int16 :: m Int16 int32 :: m Int32 int64 :: m Int64 word16 :: m Word16 word32 :: m Word32 word64 :: m Word64 word16 = fmap fromIntegral int16 word32 = fmap fromIntegral int32 word64 = fmap fromIntegral int64 asciiz :: m (UArray Int Word8) asciiz = do s <- decodeStr [] return $! (listArray (0, length s - 1) s) where decodeStr acc = do b <- byte if b == 0 then return (reverse acc) else decodeStr (b : acc) eof :: m () atEof :: m Bool times :: Int -> m a -> m [a] times = count Again, I would do it a bit differently today. For example, this interface says nothing about endianness. Recently I've used a different interface for a different protocol. This is a state monad where the state is a slice of the buffer: newtype BufferReader a instance Monad BufferReader instance MonadZero BufferReader byteAt :: Int -> BufferReader Word8 -- changes the state for subsequent computation skip :: Int -> BufferReader () -- runs the given computation in a slice slicing :: Int -> Int -> BufferReader a -> BufferReader a slicing start len br = ... many :: BufferReader a -> BufferReader [a] runBufferReader :: WithBuffer b => b -> BufferReader a -> IO (Either String a) darcs' FastPackedString module, which was recently put into a separate library by Don Stewart (http://www.cse.unsw.edu.au/~dons/code/fps), could be nice for parsing binary messages, because: - it is (supposed to be) fast and memory efficient - supports fast (O(1)) random access and slices (tailPS, initPS, dropPS, takePS) with purely functional interface - is based on bytes But I am slightly worried about the possibility of space leaks, when a small slice holds the entire message in memory. Random thoughts: - I am often using Template Haskell to automate the generation of parsers and unparsers (it helps tremendously when you have many data types with many fields to parse/unparse, and even more if the protocol changes often) - there are some libraries for dealing with serialisation in Haskell, for example : http://www.cs.helsinki.fi/u/ekarttun/SerTH/ - there is an attempt to write an operating system in Haskell: http://www.cse.ogi.edu/~hallgren/House/ you can check how it handles IP4/UDP/TCP headers Best regards Tomasz

On Tue, 30 Aug 2005, Tomasz Zielonka wrote:
On Tue, Aug 30, 2005 at 12:41:20PM +0200, Joel Reymont wrote:
Erlang does this nicely, I replied to the LtU thread. I positively got the impression that nobody was parsing binary data in Haskell ;).
I am doing this quite often, I apologize for not sharing my experience and promise to improve ;-)
BTW, if efficiency is not a primary concern, Parsec can be quite nice for decoding binary messages of many protocols.
I'd quite like to see some benchmarks for Parsec parsers compiled with jhc, I can't help thinking that the optimisations involved would make them go much faster. Not really practical right here and right now AFAIK, but hey. -- flippa@flippac.org Ivanova is always right. I will listen to Ivanova. I will not ignore Ivanova's recomendations. Ivanova is God. And, if this ever happens again, Ivanova will personally rip your lungs out!

On Tue, Aug 30, 2005 at 12:35:30PM +0100, Philippa Cowderoy wrote:
On Tue, 30 Aug 2005, Tomasz Zielonka wrote:
On Tue, Aug 30, 2005 at 12:41:20PM +0200, Joel Reymont wrote:
Erlang does this nicely, I replied to the LtU thread. I positively got the impression that nobody was parsing binary data in Haskell ;).
I am doing this quite often, I apologize for not sharing my experience and promise to improve ;-)
BTW, if efficiency is not a primary concern, Parsec can be quite nice for decoding binary messages of many protocols.
I'd quite like to see some benchmarks for Parsec parsers compiled with jhc, I can't help thinking that the optimisations involved would make them go much faster. Not really practical right here and right now AFAIK, but hey.
I'll work on it :). it does seem to behave particularly well on failable state monads like the Maybe monad so should do well with parsec too. I have not figured out why, I think it is due to my CPR being generalized to work on non-CPR types as long as they are used in a CPR fashion (the vast majority of functions in such monads are of the 'successful' variety). This optimazation can most likely be ported to ghc easily and it appears to be a big win in certain situations. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (7)
-
Andre Pang
-
Bayley, Alistair
-
Joel Reymont
-
John Meacham
-
Miles Sabin
-
Philippa Cowderoy
-
Tomasz Zielonka