
Hi, Just wondering whether I can use ShowS or tupling or Difference Lists to speed up the following code? It's basic text processing. It takes in a list of Lines where each Line is a list of Words and intersperses " " between them then concatenates them into a longer String. Note that there is a recursive call and the ++ operator. Thanks Mark -- Function: joinLines -- Joins the Words within Lines together with whitespace and newline characters -- Argument: Lines to pad with whitespace and newlines -- Evaluate: The processed and concatenated String joinLines :: [Line] -> String joinLines (l:[]) = concat (intersperse " " l) joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls)

Am Samstag, den 06.02.2010, 23:12 +1030 schrieb Mark Spezzano:
Hi,
Just wondering whether I can use ShowS or tupling or Difference Lists to speed up the following code?
It's basic text processing. It takes in a list of Lines where each Line is a list of Words and intersperses " " between them then concatenates them into a longer String. Note that there is a recursive call and the ++ operator.
Thanks
Mark
-- Function: joinLines -- Joins the Words within Lines together with whitespace and newline characters -- Argument: Lines to pad with whitespace and newlines -- Evaluate: The processed and concatenated String joinLines :: [Line] -> String joinLines (l:[]) = concat (intersperse " " l) joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls)
You should use the existing library functions and leave the optimisations to their implementor: import Data.List joinLines :: [[String]] -> String joinLines = intercalate "\n" . map (intercalate " ") Now you can easily switch to the faster ByteString library by simply changing the import statement.

On Sat, Feb 06, 2010 at 11:12:40PM +1030, Mark Spezzano wrote:
-- Function: joinLines -- Joins the Words within Lines together with whitespace and newline characters -- Argument: Lines to pad with whitespace and newlines -- Evaluate: The processed and concatenated String joinLines :: [Line] -> String joinLines (l:[]) = concat (intersperse " " l) joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls)
Why not just joinLines = unlines . map unwords This should be as fast as you may get using lists of lists of lists of Chars :). -- Felipe.

Am Samstag 06 Februar 2010 13:42:40 schrieb Mark Spezzano:
Hi,
Just wondering whether I can use ShowS or tupling or Difference Lists to speed up the following code?
It's basic text processing. It takes in a list of Lines where each Line is a list of Words and intersperses " " between them then concatenates them into a longer String. Note that there is a recursive call and the ++ operator.
Thanks
Mark
-- Function: joinLines -- Joins the Words within Lines together with whitespace and newline characters -- Argument: Lines to pad with whitespace and newlines -- Evaluate: The processed and concatenated String joinLines :: [Line] -> String joinLines (l:[]) = concat (intersperse " " l) joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls)
joinLines = init . unlines . map unwords joinLines = concat . intersperse "\n" . map unwords joinLines = intercalate "\n" . map unwords joinLines = intercalate "\n" . map (intercalate " ") it should be pretty good already, if that's a performance bottleneck, you might need to switch to (Lazy) ByteStrings. I don't think ShowS or difference lists would be any faster.

As other people have mentioned, you are duplicating library
functionality. But nobody has actually talked about the performance
characteristics of your code.
Fortunately for you, the calls to (++) in your recursion are
right-associative, so you don't have an asymptotic problem where your
program gets slower and slower for large inputs; it should stay
linear.
But you are wasting some work. In particular, (concat (intersperse "
" l)) produces a String, and then (++) duplicates all of the cons
cells in that string as it rebuilds it so that the tail connects with
the next string.
So there is a potential benefit to using a difference list, albeit
only by around a 2x factor.
-- ryan
On Sat, Feb 6, 2010 at 4:42 AM, Mark Spezzano
Hi,
Just wondering whether I can use ShowS or tupling or Difference Lists to speed up the following code?
It's basic text processing. It takes in a list of Lines where each Line is a list of Words and intersperses " " between them then concatenates them into a longer String. Note that there is a recursive call and the ++ operator.
Thanks
Mark
-- Function: joinLines -- Joins the Words within Lines together with whitespace and newline characters -- Argument: Lines to pad with whitespace and newlines -- Evaluate: The processed and concatenated String joinLines :: [Line] -> String joinLines (l:[]) = concat (intersperse " " l) joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Am Samstag, den 06.02.2010, 10:28 -0800 schrieb Ryan Ingram:
As other people have mentioned, you are duplicating library functionality. But nobody has actually talked about the performance characteristics of your code.
Fortunately for you, the calls to (++) in your recursion are right-associative, so you don't have an asymptotic problem where your program gets slower and slower for large inputs; it should stay linear.
But you are wasting some work. In particular, (concat (intersperse " " l)) produces a String, and then (++) duplicates all of the cons cells in that string as it rebuilds it so that the tail connects with the next string.
By applying rewrite rules from the standard libraries, GHC figures out that (concat xs ++ ys) == (foldr (++) ys xs) for every xs, ys. So, the rule joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls) becomes joinLines (l:ls) = foldr (++) ('\n':joinLines ls) (intersperse " " l) However, the problem remains the same: For every line you have to construct a list (intersperse " " l) that is discarded immediately. But if you implement function intersperse as intersperse _ [] = [] intersperse sep (h:xs) = h : concatMap (\x -> [sep,x]) xs then you can derive the following implementation by applying the rules for list fusion: joinLines :: [[String]] -> String joinLines [[]] = "" joinLines [h:xs] = h ++ foldr (\x y -> ' ': x ++ y) "" xs joinLines ([]:ls) = '\n':jl ls joinLines ((h:xs):ls) = h ++ foldr (\x y -> ' ': x ++ y) ('\n' : joinLines ls) xs This version takes 10 seconds where the original version as well as the idiomatic (unlines . map unwords) takes 15 seconds. Unfortunately, you have to do that optimisation by hand; GHC doesn't figure it out by itself.
So there is a potential benefit to using a difference list, albeit only by around a 2x factor.
-- ryan
On Sat, Feb 6, 2010 at 4:42 AM, Mark Spezzano
wrote: Hi,
Just wondering whether I can use ShowS or tupling or Difference Lists to speed up the following code?
It's basic text processing. It takes in a list of Lines where each Line is a list of Words and intersperses " " between them then concatenates them into a longer String. Note that there is a recursive call and the ++ operator.
Thanks
Mark
-- Function: joinLines -- Joins the Words within Lines together with whitespace and newline characters -- Argument: Lines to pad with whitespace and newlines -- Evaluate: The processed and concatenated String joinLines :: [Line] -> String joinLines (l:[]) = concat (intersperse " " l) joinLines (l:ls) = (concat (intersperse " " l)) ++ ('\n':joinLines ls)

On Sat, Feb 6, 2010 at 1:42 PM, Mark Spezzano
Just wondering whether I can use ShowS or tupling or Difference Lists to speed up the following code?...
In case you do want to use a difference list, you could also use a DString[1]. A DString is just a newtype wrapper around a difference list. It has an instance for IsString[2] so you can create them by writing string literals. regards, Bas [1] http://hackage.haskell.org/package/dstring [2] http://hackage.haskell.org/packages/archive/base/4.2.0.0/doc/html/Data-Strin...
participants (6)
-
Bas van Dijk
-
Daniel Fischer
-
Felipe Lessa
-
Holger Siegel
-
Mark Spezzano
-
Ryan Ingram