
Hi Haskellers,
I just rewrote the code to a state-machine in the hope that I can
eventually collapse several stages in a pipeline into one, but this simple
state-machine version turns out to be about 3 times slower even though it
does the same thing:
newtype Blank = Blank
{ blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
}
escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapeChar bs = case BS.uncons bs of
Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then
escapeChar else escapedChar)))
Nothing -> Nothing
escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapedChar bs = case BS.uncons bs of
Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
Nothing -> Nothing
fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
fastBlank = fastBlank' escapeChar
fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8,
(BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
fastBlank' blank = do
mbs <- await
case mbs of
Just bs -> do
let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs)
(\(bs, Blank f) -> f bs) (bs, Blank blank)
yield cs
fastBlank' newBlank
Nothing -> return ()
I worry that if I go this approach, just the cost of the state-machine
might mean I only break-even.
Is there any reason why this version should be slower?
Cheers,
-John
On Sun, 3 Apr 2016 at 23:11 John Ky
Hello Haskellers,
I’ve been trying to squeeze as much performance out of my code as possible and I’ve come to a point where can’t figure out what more I can do.
Here is some example code:
blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankEscapedChars = blankEscapedChars' ""
blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString blankEscapedChars' rs = do mbs <- await case mbs of Just bs -> do let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs)) yield ds blankEscapedChars' (BS.drop (BS.length ds) cs) Nothing -> when (BS.length rs > 0) (yield rs) where unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString)) unescapeByteString (wasEscaped, bs) = case BS.uncons bs of Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs)) Just (c, cs) | c /= wBackslash -> Just (c, (False, cs)) Just (c, cs) -> Just (c, (True, cs)) Nothing -> Nothing
The above function blankEscapedChars will go find all \ characters and convert the following character to a _. For a 1 MB in memory JSON ByteString, it benches at about 6.6 ms
In all my code the basic strategy is the same. await for the next byte string, then use and unfoldrN to produce a new ByteString for yielding.
Anyone know of a way to go faster?
Cheers,
-John