Re: [Haskell-cafe] parsing currency amounts with parsec

10 May 2011

      On 5/9/11 10:04 PM, Antoine Latter wrote:
...
On Mon, May 9, 2011 at 5:07 PM, Eric Rasmussen<ericrasmussen@gmail.com>  wrote:
...
Hi everyone,
I am relatively new to Haskell and Parsec, and I couldn't find any articles
on parsing numbers in the following format:
Positive: $115.33
Negative: ($1,323.42)
I'm working on the parser for practical purposes (to convert a 3rd-party
generated, most unhelpful format into one I can use), and I'd really
appreciate any insight into a better way to do this, or if there are any
built-in functions/established libraries that would be better suited to the
task. My code below works, but doesn't seem terribly efficient.
Why do you think it inefficient? Is it slow?
I don't have any substantial suggestions, but from a style perspective:
* I would question the use of IEEE binary-floating-point number types
for currency. Haskell ships with a fixed-point decimal library, but I
don't know how fast it is:
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Fixed....
There are also a few other options (that I can't seem to find links to 
at the moment), and some hints on how to do it yourself:

http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-3.html

I'm not sure if your choice of parsing library is fixed or not, but you 
could probably speed things up significantly by using Attoparsec. In 
particular, Attoparsec's combinators for takeWhile, takeWhile1,scan,... 
return bytestrings, and you can then fold over the bytestring quite 
nicely. If your source is ASCII or anything ASCII compatible (Latin-1, 
Latin-9, UTF-8,...) then take a look at Data.Attoparsec.Char8.decimal[1][2].

Unless you need to verify that commas occur exactly every third digit, 
I'd suggest (a) dropping commas while scanning the string, or (b) 
implicitly dropping commas while folding over the string. If you do need 
to verify this, then your best option is (a), since you can maintain a 
state machine about how many digits seen since the last comma. Once 
you're using the Attoparsec strategy of folding over the raw byte 
buffers, then the only room for improvement is going to be making the 
code as straight-line as possible.

Some untested example code:

     import qualified Data.Attoparsec       as A
     import qualified Data.Attoparsec.Char8 as A8
     import qualified Data.ByteString       as B

     rawCurrency :: A.Parser (ByteString,ByteString)
     rawCurrency = do
         dollars <- A.scan 0 step
         _       <- A.char '.'               -- Assuming it's required...
         cents   <- A.takeWhile1 isDigit_w8  -- Assuming no commas...
         return (dollars,cents)
         where
         step :: Int -> Word8 -> Maybe Int
         step 3 0x2C             = Just 0
         step s c | isDigit_w8 c = Just $! s+1
         step _ _                = Nothing

     -- Note: the order of comparisons is part of why it's fast.
     -- | A fast digit predicate.
     isDigit_w8 :: Word8 -> Bool
     isDigit_w8 w = (w <= 0x39 && w >= 0x30)
     {-# INLINE isDigit_w8 #-}

     -- With the dots filled in by whatever representation you use.
     currency :: A.Parser ...
     currency = do
         (dollars,cents) <- rawCurrency
         let step a w = a * 10 + fromIntegral (w - 0x30)
             d = B.foldl' step 0 (B.filter (/= 0x2C) dollars)
             c = fromIntegral (B.foldl' step 0 cents)
                 / (10 ^ length cents)
         return (... d ... c ...)

     amount :: A.Parser ...
     amount = pos <|> neg
         where
         pos = A8.char '$' *> currency
         neg = do
             _ <- A8.string "($"
             a <- currency
             _ <- A8.char ')'
             return (negate a)

[1] And if you're using Attoparsec itself, you may want to take a look 
at Data.Attoparsec.Zepto as well.

[2] If you're basing code on Attoparsec, you may want to look at some of 
my pending patches which improve performance on this (already extremely 
fast) code:

     https://bitbucket.org/winterkoninkje/attoparsec/changesets

-- 
Live well,
~wren

Re: [Haskell-cafe] parsing currency amounts with parsec

wren ng thornton