
Hi Bjorn, hi list, the darcswatch instance I’m running is getting quite big, and it periodically slows down my server. I managed to get quite an improvement with this simple patch to my parsing: http://darcs.nomeata.de/cgi-bin/darcsweb.cgi?r=darcswatch;a=commitdiff;h=200... Since the new HTTP library, my code runs completely on ByteStrings, only the xhtml library expects me to feed strings. I tried to fix this and created a patch against xhtml-3000.2.0.1 to work internally with lazy ByteStrings. It’s API-compatible to the normal xhtml library, it just adds showHtml', renderHtml' and prettyHtml' that output lazy ByteStrings, and that has Html instances for strict any lazy ByteStrings. There were some speed and space improvements, but none horrific (at least for DarcsWatch, most of the time goes into parsing the repositories and mails, and into sorting that data). Unfortunately, I can’t use it in the live installation until I upgrade the machine from Debian etch to lenny, as the bundled bytestring library in ghc-6.6’ base is too old. Nevertheless, I’m sharing my patch here, maybe it’s useful for some, or maybe it can be the base for an official xhtml release with bytestrings inside. To speed things up even more one should probably create a type analogous to ShowS, i.e. (L.ByteString -> L.ByteString), that allows you to concatenate ByteString chunks cheaply. I also noted that the current version of bytestring implements "concatMap" in a way that is guaranteed to rip apart the string into very small chunks, even if the mapped function returns the same character most times (as it is the case for the Html escaping function). Therefore, I wrote this function: -- | More efficient variant of 'L.concatMap' concatMapL' :: (Char -> String) -> L.ByteString -> L.ByteString concatMapL' f s = go s where go s = let (unmodified, modified) = L.span (\c -> f c == [c]) s in case L.uncons modified of Nothing -> unmodified Just (c,rest) -> L.pack (f c) `L.append` go rest Does this make sense? Should it maybe replace the function in the library? Greetings, Joachim [1] http://darcswatch.nomeata.de/ -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

I think that your instance is too specific, although useful for the
particular case of escaping.
I've done my own implementation for fun:
concatMap' :: (Word8 -> L.ByteString) -> L.ByteString -> L.ByteString
concatMap' f s = L.unfoldr p x0
where x0 = (LI.Empty, s, 0, 0)
p (LI.Empty, LI.Empty, _, _) = Nothing
p (LI.Empty, c@(LI.Chunk s s'), _, ns) | ns==S.length s = p
(LI.Empty, s', 0, 0)
| otherwise = p
(f (S.index s ns), c, 0, ns+1)
p (c@(LI.Chunk s s'), bs, nf, ns) | nf==S.length s = p (s', bs, 0, ns)
| otherwise = Just
(S.index s nf, (c, bs, nf+1, ns))
It turns out to be both a lot slower (don't know why) and with greater
memory residence (because it's not lazy) than the built-in concatMap
for bytestrings in my synthetic tests. However, it produces a
bytestring with few chunks (due to a good implementation of unfoldr)
in an asymptotically optimal time. I also noticed that for a case like
unescaping, where 'f' produces a small string (5) and the source
bytestring is large (10mln), my version spends 24% in GC whereas the
standard version spends 68%.
A drawback is that the resulting bytestring is not lazy at all.
I wonder how one might optimize it. It looks like a function that a
compiler should optimize very well without my help, although I didn't
read the core etc.
2009/1/20 Joachim Breitner
Hi Bjorn, hi list,
the darcswatch instance I'm running is getting quite big, and it periodically slows down my server. I managed to get quite an improvement with this simple patch to my parsing: http://darcs.nomeata.de/cgi-bin/darcsweb.cgi?r=darcswatch;a=commitdiff;h=200...
Since the new HTTP library, my code runs completely on ByteStrings, only the xhtml library expects me to feed strings. I tried to fix this and created a patch against xhtml-3000.2.0.1 to work internally with lazy ByteStrings. It's API-compatible to the normal xhtml library, it just adds showHtml', renderHtml' and prettyHtml' that output lazy ByteStrings, and that has Html instances for strict any lazy ByteStrings.
There were some speed and space improvements, but none horrific (at least for DarcsWatch, most of the time goes into parsing the repositories and mails, and into sorting that data). Unfortunately, I can't use it in the live installation until I upgrade the machine from Debian etch to lenny, as the bundled bytestring library in ghc-6.6' base is too old.
Nevertheless, I'm sharing my patch here, maybe it's useful for some, or maybe it can be the base for an official xhtml release with bytestrings inside.
To speed things up even more one should probably create a type analogous to ShowS, i.e. (L.ByteString -> L.ByteString), that allows you to concatenate ByteString chunks cheaply.
I also noted that the current version of bytestring implements "concatMap" in a way that is guaranteed to rip apart the string into very small chunks, even if the mapped function returns the same character most times (as it is the case for the Html escaping function). Therefore, I wrote this function:
-- | More efficient variant of 'L.concatMap' concatMapL' :: (Char -> String) -> L.ByteString -> L.ByteString concatMapL' f s = go s where go s = let (unmodified, modified) = L.span (\c -> f c == [c]) s in case L.uncons modified of Nothing -> unmodified Just (c,rest) -> L.pack (f c) `L.append` go rest
Does this make sense? Should it maybe replace the function in the library?
Greetings, Joachim
[1] http://darcswatch.nomeata.de/
-- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi,
I improved the implementation of concatMap:
concatMap' :: (Word8 -> L.ByteString) -> L.ByteString -> L.ByteString
concatMap' f s = L.unfoldr p x0
where x0 = (LI.Empty, s)
p (sf, s) = case L.uncons sf of
Just (c,sf') -> Just (c, (sf',s))
Nothing -> case L.uncons s of
Nothing -> Nothing
Just (c,s') -> p (f c, s')
I did a test that performs something like escaping in the following fashion:
main = putStrLn . show . L.foldr (+) 0 $ concatMap' f xs
where xs = L.pack $ concat $ replicate 100000 [1..10]
f 1 = esc
f x = L.pack [x]
esc = L.pack [1,2,3,4]
So, every 10th character is escaped.
My version works 4x faster than L.concatMap.
However, if 'f' returns larger strings, it quickly becomes dramatically slower.
So, I'd recommend it for escaping and wouldn't recomment it for anything else.
2009/1/20 Joachim Breitner
Hi Bjorn, hi list,
the darcswatch instance I'm running is getting quite big, and it periodically slows down my server. I managed to get quite an improvement with this simple patch to my parsing: http://darcs.nomeata.de/cgi-bin/darcsweb.cgi?r=darcswatch;a=commitdiff;h=200...
Since the new HTTP library, my code runs completely on ByteStrings, only the xhtml library expects me to feed strings. I tried to fix this and created a patch against xhtml-3000.2.0.1 to work internally with lazy ByteStrings. It's API-compatible to the normal xhtml library, it just adds showHtml', renderHtml' and prettyHtml' that output lazy ByteStrings, and that has Html instances for strict any lazy ByteStrings.
There were some speed and space improvements, but none horrific (at least for DarcsWatch, most of the time goes into parsing the repositories and mails, and into sorting that data). Unfortunately, I can't use it in the live installation until I upgrade the machine from Debian etch to lenny, as the bundled bytestring library in ghc-6.6' base is too old.
Nevertheless, I'm sharing my patch here, maybe it's useful for some, or maybe it can be the base for an official xhtml release with bytestrings inside.
To speed things up even more one should probably create a type analogous to ShowS, i.e. (L.ByteString -> L.ByteString), that allows you to concatenate ByteString chunks cheaply.
I also noted that the current version of bytestring implements "concatMap" in a way that is guaranteed to rip apart the string into very small chunks, even if the mapped function returns the same character most times (as it is the case for the Html escaping function). Therefore, I wrote this function:
-- | More efficient variant of 'L.concatMap' concatMapL' :: (Char -> String) -> L.ByteString -> L.ByteString concatMapL' f s = go s where go s = let (unmodified, modified) = L.span (\c -> f c == [c]) s in case L.uncons modified of Nothing -> unmodified Just (c,rest) -> L.pack (f c) `L.append` go rest
Does this make sense? Should it maybe replace the function in the library?
Greetings, Joachim
[1] http://darcswatch.nomeata.de/
-- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Евгений Кирпичев Разработчик Яндекс.Маркета
participants (2)
-
Eugene Kirpichov
-
Joachim Breitner