Hi Haskellers,
I just rewrote the code to a state-machine in the hope that I can eventually collapse several stages in a pipeline into one, but this simple state-machine version turns out to be about 3 times slower even though it does the same thing:
newtype Blank = Blank
{ blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
}
escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapeChar bs = case BS.uncons bs of
Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then escapeChar else escapedChar)))
Nothing -> Nothing
escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapedChar bs = case BS.uncons bs of
Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
Nothing -> Nothing
fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
fastBlank = fastBlank' escapeChar
fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
fastBlank' blank = do
mbs <- await
case mbs of
Just bs -> do
let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs) (\(bs, Blank f) -> f bs) (bs, Blank blank)
yield cs
fastBlank' newBlank
Nothing -> return ()
I worry that if I go this approach, just the cost of the state-machine might mean I only break-even.
Is there any reason why this version should be slower?
Cheers,
-John
Hello Haskellers,
I’ve been trying to squeeze as much performance out of my code as possible and I’ve come to a point where can’t figure out what more I can do.
Here is some example code:
blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankEscapedChars = blankEscapedChars' "" blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString blankEscapedChars' rs = do mbs <- await case mbs of Just bs -> do let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs)) yield ds blankEscapedChars' (BS.drop (BS.length ds) cs) Nothing -> when (BS.length rs > 0) (yield rs) where unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString)) unescapeByteString (wasEscaped, bs) = case BS.uncons bs of Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs)) Just (c, cs) | c /= wBackslash -> Just (c, (False, cs)) Just (c, cs) -> Just (c, (True, cs)) Nothing -> Nothing
The above function
blankEscapedChars
will go find all\
characters and convert the following character to a_
. For a1 MB
in memory JSONByteString
, it benches at about6.6 ms
In all my code the basic strategy is the same.
await
for the next byte string, then use andunfoldrN
to produce a newByteString
for yielding.Anyone know of a way to go faster?
Cheers,
-John