
Hello, I wrote the following split function for Strings: splitS :: String -> String -> [String] splitS a b = splitA a b where z = length b - 1 splitA [] _ = [""] splitA (c:cs) (d:ds) | c == d && fst s == ds = "" : splitA (snd s) b | otherwise = (c : head r) : tail r where r = splitA cs b s = splitAt z cs example: splitS "Test;Hello;+123.5;+ 7" ";+" -> ["Test;Hallo","123.5"," 7"] How could it be optimized on speed? Does there exist already an optimized split for the conventional String type? -- Thanks in advance for your answer. h.

I think you want
Text.Regex. splitRegex
or something very much like it.
http://haskell.org/hoogle/?q=String-%3E%5BString%5D
2007/3/1, h.
Hello,
I wrote the following split function for Strings:
splitS :: String -> String -> [String] splitS a b = splitA a b where z = length b - 1 splitA [] _ = [""] splitA (c:cs) (d:ds) | c == d && fst s == ds = "" : splitA (snd s) b | otherwise = (c : head r) : tail r where r = splitA cs b s = splitAt z cs
example: splitS "Test;Hello;+123.5;+ 7" ";+" -> ["Test;Hallo","123.5"," 7"]
How could it be optimized on speed? Does there exist already an optimized split for the conventional String type?
-- Thanks in advance for your answer. h.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

"h." said:
splitS :: String -> String -> [String] splitS a b = splitA a b where z = length b - 1 splitA [] _ = [""] splitA (c:cs) (d:ds) | c == d && fst s == ds = "" : splitA (snd s) b | otherwise = (c : head r) : tail r where r = splitA cs b s = splitAt z cs
How could it be optimized on speed?
[Brief pause while I do pennance for considering optimisation without profiling]... At first glance, your implementation is probably not too bad. Consider the worst case, for some n and m: splitS (replicate n '.') (replicate m '.' ++ "!") I think your splitS is O(nm) in the worst case, and it's hard to imagine improving on that [1]. In more typical cases, where the delimiter doesn't have a repeated prefix, you'll get closer to O(n). Further, your splitS is about as lazy as it could be, so you won't do any more work than necessary if the output is only partially consumed. So unless I've missed something, you've just got the constant factor to work with. You could perhaps fuse a loop or two, to reduce the number of times sub-lists are traversed and relinked, although your compiler may already be doing this for you. For example, you could combine the use of splitAt with the string comparison (==) to arrive at something like this:
splitS [] _ = [[]] splitS s [] = map return s splitS a@(x:xs) b = maybe d c t where d = (x : head r) : tail r c s = [] : splitS s b t = takePrefix a b r = splitS xs b
takePrefix (x:xs) (y:ys) | x == y = takePrefix xs ys takePrefix s [] = Just s takePrefix _ _ = Nothing
Often, hand-coded loop fusion reduces clarity of expression, but in this case I think it is an improvement, because takePrefix is a potentially useful abstraction in its own right. There may be more you can do, but that's a start. The other question you might like to ask yourself is whether it can be written more clearly, for example without using explicit recursion. [1] - unless you write code to analyse the delimiter for repeated prefixes before proceeding to split the string. Doing that, you could probably achieve O(n + f(m)), for some f(m).

Thanks a lot. I hope I can learn from your lines and ideas used here to improve future code in quality. -- Best regards h.
participants (3)
-
h.
-
Matthew Brecknell
-
Thomas Hartman