How to write faster ByteString/Conduit code

Hello Haskellers, I’ve been trying to squeeze as much performance out of my code as possible and I’ve come to a point where can’t figure out what more I can do. Here is some example code: blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankEscapedChars = blankEscapedChars' "" blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString blankEscapedChars' rs = do mbs <- await case mbs of Just bs -> do let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs)) yield ds blankEscapedChars' (BS.drop (BS.length ds) cs) Nothing -> when (BS.length rs > 0) (yield rs) where unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString)) unescapeByteString (wasEscaped, bs) = case BS.uncons bs of Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs)) Just (c, cs) | c /= wBackslash -> Just (c, (False, cs)) Just (c, cs) -> Just (c, (True, cs)) Nothing -> Nothing The above function blankEscapedChars will go find all \ characters and convert the following character to a _. For a 1 MB in memory JSON ByteString, it benches at about 6.6 ms In all my code the basic strategy is the same. await for the next byte string, then use and unfoldrN to produce a new ByteString for yielding. Anyone know of a way to go faster? Cheers, -John

Hi Haskellers,
I just rewrote the code to a state-machine in the hope that I can
eventually collapse several stages in a pipeline into one, but this simple
state-machine version turns out to be about 3 times slower even though it
does the same thing:
newtype Blank = Blank
{ blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
}
escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapeChar bs = case BS.uncons bs of
Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then
escapeChar else escapedChar)))
Nothing -> Nothing
escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapedChar bs = case BS.uncons bs of
Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
Nothing -> Nothing
fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
fastBlank = fastBlank' escapeChar
fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8,
(BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
fastBlank' blank = do
mbs <- await
case mbs of
Just bs -> do
let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs)
(\(bs, Blank f) -> f bs) (bs, Blank blank)
yield cs
fastBlank' newBlank
Nothing -> return ()
I worry that if I go this approach, just the cost of the state-machine
might mean I only break-even.
Is there any reason why this version should be slower?
Cheers,
-John
On Sun, 3 Apr 2016 at 23:11 John Ky
Hello Haskellers,
I’ve been trying to squeeze as much performance out of my code as possible and I’ve come to a point where can’t figure out what more I can do.
Here is some example code:
blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankEscapedChars = blankEscapedChars' ""
blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString blankEscapedChars' rs = do mbs <- await case mbs of Just bs -> do let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs)) yield ds blankEscapedChars' (BS.drop (BS.length ds) cs) Nothing -> when (BS.length rs > 0) (yield rs) where unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString)) unescapeByteString (wasEscaped, bs) = case BS.uncons bs of Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs)) Just (c, cs) | c /= wBackslash -> Just (c, (False, cs)) Just (c, cs) -> Just (c, (True, cs)) Nothing -> Nothing
The above function blankEscapedChars will go find all \ characters and convert the following character to a _. For a 1 MB in memory JSON ByteString, it benches at about 6.6 ms
In all my code the basic strategy is the same. await for the next byte string, then use and unfoldrN to produce a new ByteString for yielding.
Anyone know of a way to go faster?
Cheers,
-John

It turns out that using a simple enum type to implement a state machine instead of a function avoids the performance penalty and allows me to collapse a four stage conduit pipeline into one with 4 x performance improvement. blankStrings :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankStrings = blankStrings' InJson blankStrings' :: MonadThrow m => FastState -> Conduit BS.ByteString m BS.ByteString blankStrings' lastState = do mbs <- await case mbs of Just bs -> do let (!cs, Just (!nextState, _)) = unfoldrN (BS.length bs) blankByteString (lastState, bs) yield cs blankStrings' nextState Nothing -> return () where blankByteString :: (FastState, ByteString) -> Maybe (Word8, (FastState, ByteString)) blankByteString (InJson, bs) = case BS.uncons bs of Just (!c, !cs) | isLeadingDigit c -> Just (w1 , (InNumber , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs)) Just (!c, !cs) | isAlphabetic c -> Just (c , (InIdent , cs)) Just (!c, !cs) -> Just (c , (InJson , cs)) Nothing -> Nothing blankByteString (InString, bs) = case BS.uncons bs of Just (!c, !cs) | c == wBackslash -> Just (wSpace , (Escaped , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wCloseParen, (InJson , cs)) Just (_ , !cs) -> Just (wSpace , (InString , cs)) Nothing -> Nothing blankByteString (Escaped, bs) = case BS.uncons bs of Just (_, !cs) -> Just (wSpace, (InString, cs)) Nothing -> Nothing blankByteString (InNumber, bs) = case BS.uncons bs of Just (!c, !cs) | isTrailingDigit c -> Just (w0 , (InNumber , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs)) Just (!c, !cs) | isAlphabetic c -> Just (c , (InIdent , cs)) Just (!c, !cs) -> Just (c , (InJson , cs)) Nothing -> Nothing blankByteString (InIdent, bs) = case BS.uncons bs of Just (!c, !cs) | isAlphabetic c -> Just (wUnderscore, (InIdent , cs)) Just (!c, !cs) | isLeadingDigit c -> Just (w1 , (InNumber , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs)) Just (!c, !cs) -> Just (c , (InJson , cs)) Nothing -> Nothing I’m quite please with this, but any further suggestions are still welcome. Cheers, -John On Sun, 3 Apr 2016 at 23:55 John Ky newhoggy@gmail.com http://mailto:newhoggy@gmail.com wrote: Hi Haskellers,
I just rewrote the code to a state-machine in the hope that I can eventually collapse several stages in a pipeline into one, but this simple state-machine version turns out to be about 3 times slower even though it does the same thing:
newtype Blank = Blank { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank)) }
escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank)) escapeChar bs = case BS.uncons bs of Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then escapeChar else escapedChar))) Nothing -> Nothing
escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank)) escapedChar bs = case BS.uncons bs of Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar)) Nothing -> Nothing
fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString fastBlank = fastBlank' escapeChar
fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString fastBlank' blank = do mbs <- await case mbs of Just bs -> do let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs) (\(bs, Blank f) -> f bs) (bs, Blank blank) yield cs fastBlank' newBlank Nothing -> return ()
I worry that if I go this approach, just the cost of the state-machine might mean I only break-even.
Is there any reason why this version should be slower?
Cheers,
-John
On Sun, 3 Apr 2016 at 23:11 John Ky
wrote: Hello Haskellers,
I’ve been trying to squeeze as much performance out of my code as possible and I’ve come to a point where can’t figure out what more I can do.
Here is some example code:
blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankEscapedChars = blankEscapedChars' ""
blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString blankEscapedChars' rs = do mbs <- await case mbs of Just bs -> do let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs)) yield ds blankEscapedChars' (BS.drop (BS.length ds) cs) Nothing -> when (BS.length rs > 0) (yield rs) where unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString)) unescapeByteString (wasEscaped, bs) = case BS.uncons bs of Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs)) Just (c, cs) | c /= wBackslash -> Just (c, (False, cs)) Just (c, cs) -> Just (c, (True, cs)) Nothing -> Nothing
The above function blankEscapedChars will go find all \ characters and convert the following character to a _. For a 1 MB in memory JSON ByteString, it benches at about 6.6 ms
In all my code the basic strategy is the same. await for the next byte string, then use and unfoldrN to produce a new ByteString for yielding.
Anyone know of a way to go faster?
Cheers,
-John
participants (1)
-
John Ky