Why is my mappend so slow?

Hello all, I wrote a Logger which, under certain conditions, prepends log-entries to a log and a Monoid instance of it. But as soon as I mappend two Loggers my performance drops by 50%. This even happens when I mappend mempty as shown below in --<2--. I understand that the system has to do *something*, but it seems to cost a bit much. Without the strictness annotation in --<1-- the performance degradation is even more dramatic (orders of magnitude). The profile tells me that more that 50% of the time is spent in mappend. COST CENTRE MODULE %time %alloc mappend.\ Logger 50.6 35.8 logCount'.f Logger 18.7 40.3 logCount' Logger 5.4 0.0 Why is that so, and can I do anything about it? I am willing to change the overall design if required. This is the code -- | A writer does the formatting newtype Wtr a log = Wtr {runWtr :: a -> log} -- | A looger is a writer plus an internal state data Logger a log = Lgr {runLogger :: a -> log -> (log, Logger a log)} instance Monoid (Logger a log) where mempty = Lgr (\_ l -> (l,mempty)) mappend lgr1 lgr2 = Lgr $ \a l -> let !(log1',!lgr1') = runLogger lgr1 a l --<1-- !(log2',!lgr2') = runLogger lgr2 a log1' --<1-- in (log2', mappend lgr1' lgr2') and this is how I test it -- | Count calls __s__ and write log when s has reached nxt and then every dn calls logCount' :: Monoid log => Int -> Int -> Int -> Wtr (Int,a) log -> Logger a log logCount' dn nxt s wtr = Lgr f where f a l = if s == nxt then (runWtr wtr (s,a) <> l, logCount' dn (nxt+dn) (s+1) wtr) else (l, logCount' dn nxt (s+1) wtr) -- | Count calls and write log every dn calls logCount dn = logCount' dn dn 0 -- testLogger :: Logger Int Int [String] -> [String] testLogger lgr xs = fst $ foldl' f ([],lgr) xs where f (log', lgr') x = runLogger lgr' x log' ex_wtr :: Wtr (Int,a) [String] ex_wtr = Wtr $ \(x,_) -> ["Counted to " ++ (show x)] ex_wtr2 :: Wtr Int [String] ex_wtr2 = Wtr $ \x -> ["Counted to " ++ (show x)] ex_inputs :: [Int] ex_inputs = [1..10000000] ex_logger = mempty <> logCount 300000 ex_wtr <> mempty --<2-- -- ex_logger = logCount 300000 ex_wtr ex_main = do timeIt $ putStrLn $ ppShow $ testLogger ex_logger ex_inputs main = ex_main

My guess is that you have accumulating thunks inside your (Int,a) tuple. Be sure to force them (by using a strict pair type, bang patterns, or however else). On 11/19/2015 07:18 PM, martin wrote:
Hello all,
I wrote a Logger which, under certain conditions, prepends log-entries to a log and a Monoid instance of it. But as soon as I mappend two Loggers my performance drops by 50%. This even happens when I mappend mempty as shown below in --<2--. I understand that the system has to do *something*, but it seems to cost a bit much. Without the strictness annotation in --<1-- the performance degradation is even more dramatic (orders of magnitude).
The profile tells me that more that 50% of the time is spent in mappend.
COST CENTRE MODULE %time %alloc
mappend.\ Logger 50.6 35.8 logCount'.f Logger 18.7 40.3 logCount' Logger 5.4 0.0
Why is that so, and can I do anything about it? I am willing to change the overall design if required.
This is the code
-- | A writer does the formatting newtype Wtr a log = Wtr {runWtr :: a -> log}
-- | A looger is a writer plus an internal state data Logger a log = Lgr {runLogger :: a -> log -> (log, Logger a log)}
instance Monoid (Logger a log) where mempty = Lgr (\_ l -> (l,mempty)) mappend lgr1 lgr2 = Lgr $ \a l -> let !(log1',!lgr1') = runLogger lgr1 a l --<1-- !(log2',!lgr2') = runLogger lgr2 a log1' --<1-- in (log2', mappend lgr1' lgr2')
and this is how I test it
-- | Count calls __s__ and write log when s has reached nxt and then every dn calls logCount' :: Monoid log => Int -> Int -> Int -> Wtr (Int,a) log -> Logger a log logCount' dn nxt s wtr = Lgr f where f a l = if s == nxt then (runWtr wtr (s,a) <> l, logCount' dn (nxt+dn) (s+1) wtr) else (l, logCount' dn nxt (s+1) wtr)
-- | Count calls and write log every dn calls logCount dn = logCount' dn dn 0
-- testLogger :: Logger Int Int [String] -> [String] testLogger lgr xs = fst $ foldl' f ([],lgr) xs where f (log', lgr') x = runLogger lgr' x log'
ex_wtr :: Wtr (Int,a) [String] ex_wtr = Wtr $ \(x,_) -> ["Counted to " ++ (show x)]
ex_wtr2 :: Wtr Int [String] ex_wtr2 = Wtr $ \x -> ["Counted to " ++ (show x)]
ex_inputs :: [Int] ex_inputs = [1..10000000]
ex_logger = mempty <> logCount 300000 ex_wtr <> mempty --<2-- -- ex_logger = logCount 300000 ex_wtr
ex_main = do timeIt $ putStrLn $ ppShow $ testLogger ex_logger ex_inputs
main = ex_main _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

I just tried both strict pairs and seq, and it didn't change anything. Also, wouldn't then THUNKS consume a lot of memory in my heap profile? I forgot to mention that this is not the case. Max heap is around 35k and the top-consumer is ARR_WORDS. THUNK is below 1k. I am going though 10,000,000 iterations and if anything would pile up, it would consume at least one byte per iteration, wouldn't it? But I can't see 10 MBytes anywhere. It looks as if the time is really spent on *computing* something. Am 11/19/2015 um 07:15 PM schrieb Roman Cheplyaka:
My guess is that you have accumulating thunks inside your (Int,a) tuple. Be sure to force them (by using a strict pair type, bang patterns, or however else).
On 11/19/2015 07:18 PM, martin wrote:
Hello all,
I wrote a Logger which, under certain conditions, prepends log-entries to a log and a Monoid instance of it. But as soon as I mappend two Loggers my performance drops by 50%. This even happens when I mappend mempty as shown below in --<2--. I understand that the system has to do *something*, but it seems to cost a bit much. Without the strictness annotation in --<1-- the performance degradation is even more dramatic (orders of magnitude).
The profile tells me that more that 50% of the time is spent in mappend.
COST CENTRE MODULE %time %alloc
mappend.\ Logger 50.6 35.8 logCount'.f Logger 18.7 40.3 logCount' Logger 5.4 0.0
Why is that so, and can I do anything about it? I am willing to change the overall design if required.
This is the code
-- | A writer does the formatting newtype Wtr a log = Wtr {runWtr :: a -> log}
-- | A looger is a writer plus an internal state data Logger a log = Lgr {runLogger :: a -> log -> (log, Logger a log)}
instance Monoid (Logger a log) where mempty = Lgr (\_ l -> (l,mempty)) mappend lgr1 lgr2 = Lgr $ \a l -> let !(log1',!lgr1') = runLogger lgr1 a l --<1-- !(log2',!lgr2') = runLogger lgr2 a log1' --<1-- in (log2', mappend lgr1' lgr2')
and this is how I test it
-- | Count calls __s__ and write log when s has reached nxt and then every dn calls logCount' :: Monoid log => Int -> Int -> Int -> Wtr (Int,a) log -> Logger a log logCount' dn nxt s wtr = Lgr f where f a l = if s == nxt then (runWtr wtr (s,a) <> l, logCount' dn (nxt+dn) (s+1) wtr) else (l, logCount' dn nxt (s+1) wtr)
-- | Count calls and write log every dn calls logCount dn = logCount' dn dn 0
-- testLogger :: Logger Int Int [String] -> [String] testLogger lgr xs = fst $ foldl' f ([],lgr) xs where f (log', lgr') x = runLogger lgr' x log'
ex_wtr :: Wtr (Int,a) [String] ex_wtr = Wtr $ \(x,_) -> ["Counted to " ++ (show x)]
ex_wtr2 :: Wtr Int [String] ex_wtr2 = Wtr $ \x -> ["Counted to " ++ (show x)]
ex_inputs :: [Int] ex_inputs = [1..10000000]
ex_logger = mempty <> logCount 300000 ex_wtr <> mempty --<2-- -- ex_logger = logCount 300000 ex_wtr
ex_main = do timeIt $ putStrLn $ ppShow $ testLogger ex_logger ex_inputs
main = ex_main _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Maybe it is due to using lists and ++? Thats a well know inefficiency.
On Nov 19, 2015 9:06 PM, "martin"
I just tried both strict pairs and seq, and it didn't change anything. Also, wouldn't then THUNKS consume a lot of memory in my heap profile? I forgot to mention that this is not the case. Max heap is around 35k and the top-consumer is ARR_WORDS. THUNK is below 1k.
I am going though 10,000,000 iterations and if anything would pile up, it would consume at least one byte per iteration, wouldn't it? But I can't see 10 MBytes anywhere. It looks as if the time is really spent on *computing* something.
My guess is that you have accumulating thunks inside your (Int,a) tuple. Be sure to force them (by using a strict pair type, bang patterns, or however else).
On 11/19/2015 07:18 PM, martin wrote:
Hello all,
I wrote a Logger which, under certain conditions, prepends log-entries to a log and a Monoid instance of it. But as soon as I mappend two Loggers my performance drops by 50%. This even happens when I mappend mempty as shown below in --<2--. I understand that the system has to do *something*, but it seems to cost a bit much. Without the strictness annotation in --<1-- the performance degradation is even more dramatic (orders of magnitude).
The profile tells me that more that 50% of the time is spent in mappend.
COST CENTRE MODULE %time %alloc
mappend.\ Logger 50.6 35.8 logCount'.f Logger 18.7 40.3 logCount' Logger 5.4 0.0
Why is that so, and can I do anything about it? I am willing to change
Am 11/19/2015 um 07:15 PM schrieb Roman Cheplyaka: the overall design if required.
This is the code
-- | A writer does the formatting newtype Wtr a log = Wtr {runWtr :: a -> log}
-- | A looger is a writer plus an internal state data Logger a log = Lgr {runLogger :: a -> log -> (log, Logger a log)}
instance Monoid (Logger a log) where mempty = Lgr (\_ l -> (l,mempty)) mappend lgr1 lgr2 = Lgr $ \a l -> let !(log1',!lgr1') =
runLogger lgr1 a l --<1--
!(log2',!lgr2') =
runLogger lgr2 a log1' --<1--
in (log2', mappend lgr1'
lgr2')
and this is how I test it
-- | Count calls __s__ and write log when s has reached nxt and then
every dn calls
logCount' :: Monoid log => Int -> Int -> Int -> Wtr (Int,a) log -> Logger a log logCount' dn nxt s wtr = Lgr f where f a l = if s == nxt then (runWtr wtr (s,a) <> l, logCount' dn (nxt+dn) (s+1) wtr) else (l, logCount' dn nxt (s+1) wtr)
-- | Count calls and write log every dn calls logCount dn = logCount' dn dn 0
-- testLogger :: Logger Int Int [String] -> [String] testLogger lgr xs = fst $ foldl' f ([],lgr) xs where f (log', lgr') x = runLogger lgr' x log'
ex_wtr :: Wtr (Int,a) [String] ex_wtr = Wtr $ \(x,_) -> ["Counted to " ++ (show x)]
ex_wtr2 :: Wtr Int [String] ex_wtr2 = Wtr $ \x -> ["Counted to " ++ (show x)]
ex_inputs :: [Int] ex_inputs = [1..10000000]
ex_logger = mempty <> logCount 300000 ex_wtr <> mempty --<2-- -- ex_logger = logCount 300000 ex_wtr
ex_main = do timeIt $ putStrLn $ ppShow $ testLogger ex_logger ex_inputs
main = ex_main _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

(++) is only used in the Wtrs themselves, but these are only called when I am actually logging something, which is only a handful of lines. Adding an mempty Logger does not change the number of lines written but does change the execution time. So I don't think this is it. I have the feeling the slowdown originates from mappend being called each time an x is consumed, i.e. the Logger which results from mappending individual Loggers gets constructed over and over again. With this design it has to be this way, because my Loggers carry an internal state (the criteria when to log and when not to) which can change. However in most cases the state does not change. I guess my design does "not know" this and blindly re-executes mappend over and over. But I don't know how design it differently. Am 11/19/2015 um 11:54 PM schrieb Atze van der Ploeg:
Maybe it is due to using lists and ++? Thats a well know inefficiency.
On Nov 19, 2015 9:06 PM, "martin"
mailto:martin.drautzburg@web.de> wrote: I just tried both strict pairs and seq, and it didn't change anything. Also, wouldn't then THUNKS consume a lot of memory in my heap profile? I forgot to mention that this is not the case. Max heap is around 35k and the top-consumer is ARR_WORDS. THUNK is below 1k.
I am going though 10,000,000 iterations and if anything would pile up, it would consume at least one byte per iteration, wouldn't it? But I can't see 10 MBytes anywhere. It looks as if the time is really spent on *computing* something.
Am 11/19/2015 um 07:15 PM schrieb Roman Cheplyaka: > My guess is that you have accumulating thunks inside your (Int,a) tuple. > Be sure to force them (by using a strict pair type, bang patterns, or > however else). > > On 11/19/2015 07:18 PM, martin wrote: >> Hello all, >> >> I wrote a Logger which, under certain conditions, prepends log-entries to a log and a Monoid instance of it. But as soon >> as I mappend two Loggers my performance drops by 50%. This even happens when I mappend mempty as shown below in --<2--. >> I understand that the system has to do *something*, but it seems to cost a bit much. Without the strictness annotation >> in --<1-- the performance degradation is even more dramatic (orders of magnitude). >> >> The profile tells me that more that 50% of the time is spent in mappend. >> >> COST CENTRE MODULE %time %alloc >> >> mappend.\ Logger 50.6 35.8 >> logCount'.f Logger 18.7 40.3 >> logCount' Logger 5.4 0.0 >> >> Why is that so, and can I do anything about it? I am willing to change the overall design if required. >> >> >> This is the code >> >> -- | A writer does the formatting >> newtype Wtr a log = Wtr {runWtr :: a -> log} >> >> -- | A looger is a writer plus an internal state >> data Logger a log = Lgr {runLogger :: a -> log -> (log, Logger a log)} >> >> instance Monoid (Logger a log) where >> mempty = Lgr (\_ l -> (l,mempty)) >> mappend lgr1 lgr2 = Lgr $ \a l -> let !(log1',!lgr1') = runLogger lgr1 a l --<1-- >> !(log2',!lgr2') = runLogger lgr2 a log1' --<1-- >> in (log2', mappend lgr1' lgr2') >> >> >> and this is how I test it >> >> -- | Count calls __s__ and write log when s has reached nxt and then every dn calls >> logCount' :: Monoid log => Int -> Int -> Int -> Wtr (Int,a) log -> Logger a log >> logCount' dn nxt s wtr = Lgr f >> where >> f a l = if s == nxt >> then (runWtr wtr (s,a) <> l, logCount' dn (nxt+dn) (s+1) wtr) >> else (l, logCount' dn nxt (s+1) wtr) >> >> >> -- | Count calls and write log every dn calls >> logCount dn = logCount' dn dn 0 >> >> >> -- testLogger :: Logger Int Int [String] -> [String] >> testLogger lgr xs = fst $ foldl' f ([],lgr) xs >> where >> f (log', lgr') x = runLogger lgr' x log' >> >> ex_wtr :: Wtr (Int,a) [String] >> ex_wtr = Wtr $ \(x,_) -> ["Counted to " ++ (show x)] >> >> ex_wtr2 :: Wtr Int [String] >> ex_wtr2 = Wtr $ \x -> ["Counted to " ++ (show x)] >> >> ex_inputs :: [Int] >> ex_inputs = [1..10000000] >> >> ex_logger = mempty <> logCount 300000 ex_wtr <> mempty --<2-- >> -- ex_logger = logCount 300000 ex_wtr >> >> >> ex_main = do >> timeIt $ putStrLn $ ppShow $ testLogger ex_logger ex_inputs >> >> main = ex_main >> _______________________________________________ >> Haskell-Cafe mailing list >> Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >> > > > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Fri, Nov 20, 2015 at 10:18:46AM +0100, martin wrote:
(++) is only used in the Wtrs themselves, but these are only called when I am actually logging something, which is only a handful of lines. Adding an mempty Logger does not change the number of lines written but does change the execution time. So I don't think this is it.
Surely every call logs *something* even if it's []?

Am 11/20/2015 um 10:31 AM schrieb Tom Ellis:
On Fri, Nov 20, 2015 at 10:18:46AM +0100, martin wrote:
(++) is only used in the Wtrs themselves, but these are only called when I am actually logging something, which is only a handful of lines. Adding an mempty Logger does not change the number of lines written but does change the execution time. So I don't think this is it.
Surely every call logs *something* even if it's []?
Not really. If nothing gets logged, then my loggers return the original log. There is no (++) involved, not even a ([] ++ log). The cost should only be the cost of checking the condition. logCount' :: Monoid log => Int -> Int -> Int -> Wtr (Int,a) log -> Logger a log logCount' dn nxt s wtr = Lgr f where f a l = if s == nxt then (runWtr wtr (s,a) <> l, logCount' dn (nxt+dn) (s+1) wtr) else (l, logCount' dn nxt (s+1) wtr)

On 11/20/2015 12:19 PM, martin wrote:
Am 11/20/2015 um 10:31 AM schrieb Tom Ellis:
On Fri, Nov 20, 2015 at 10:18:46AM +0100, martin wrote:
(++) is only used in the Wtrs themselves, but these are only called when I am actually logging something, which is only a handful of lines. Adding an mempty Logger does not change the number of lines written but does change the execution time. So I don't think this is it.
Surely every call logs *something* even if it's []?
Not really. If nothing gets logged, then my loggers return the original log. There is no (++) involved, not even a ([] ++ log). The cost should only be the cost of checking the condition.
Even for mempty, you're still recursively mappending the "tails" of the loggers. How about introducing a designated constructor for mempty to avoid that recursive mappend? (The same constructor will be used when a logger is "done" and is known not to produce any further output.) Roman

Am 11/20/2015 um 11:33 AM schrieb Roman Cheplyaka:
On 11/20/2015 12:19 PM, martin wrote:
Am 11/20/2015 um 10:31 AM schrieb Tom Ellis:
On Fri, Nov 20, 2015 at 10:18:46AM +0100, martin wrote:
(++) is only used in the Wtrs themselves, but these are only called when I am actually logging something, which is only a handful of lines. Adding an mempty Logger does not change the number of lines written but does change the execution time. So I don't think this is it.
Surely every call logs *something* even if it's []?
Not really. If nothing gets logged, then my loggers return the original log. There is no (++) involved, not even a ([] ++ log). The cost should only be the cost of checking the condition.
Even for mempty, you're still recursively mappending the "tails" of the loggers. How about introducing a designated constructor for mempty to avoid that recursive mappend? (The same constructor will be used when a logger is "done" and is known not to produce any further output.)
This is probably the root cause, and your sugestion would probably solve the slowness of (<> mempty) and (mempty <>). But for a Logger which just produces no output "for now" it still wouldn't help, would it? E.g. if I want to log every n invocations, then a Logger will produce output at invocation n and then no output for invocations n+1 .. 2n -1, and produce output at 2n again. It can never become an mempty. This is actually the original problem which triggered it all. But my current design is overly costly. I cannnot seem to express "if the condition does not hold, do nothing" propery.
participants (4)
-
Atze van der Ploeg
-
martin
-
Roman Cheplyaka
-
Tom Ellis