Longest common prefix of a list of lists

Hi, For one of my toy project I had to find the longest common prefix of a list of strings. I figured, that this function should only work on list-structure, meaning it should have type
longestCommonPrefix :: [[a]] -> [a]
Sadly the best I could come up with was using explicit recursion. It works as far as I tested it, but I have the feeling that there's a better way to do it. Here is the code:
longestCommonPrefix :: Eq a => [[a]] -> [a] longestCommonPrefix [] = [] longestCommonPrefix xss | any null xss = [] | otherwise = loop xss [] where loop ::Eq b => [[b]] -> [b] -> [b] loop xss acc = let xs = concatMap (take 1) xss in if any (\x -> x /= head xs) (tail xs) then reverse acc else loop (map tail xss) (head xs : acc)

On Friday 29 April 2011 16:07:00, philipp siegmantel wrote:
Hi,
For one of my toy project I had to find the longest common prefix of a list of strings. I figured, that this function should only work on list-structure, meaning it should have type
longestCommonPrefix :: [[a]] -> [a]
Sadly the best I could come up with was using explicit recursion. It works as far as I tested it, but I have the feeling that there's a better way to do it.
Here is the code:
longestCommonPrefix :: Eq a => [[a]] -> [a] longestCommonPrefix [] = [] longestCommonPrefix xss | any null xss = []
| otherwise = loop xss []
where loop ::Eq b => [[b]] -> [b] -> [b]
loop xss acc =
let xs = concatMap (take 1) xss in if any (\x -> x /= head xs) (tail xs)
then reverse acc else loop (map tail xss) (head xs : acc)
A different point of view: given at least two lists, lists@(list1:more@(list2:_)), the longest common prefix of lists is the common prefix of list1 and the longest common prefix of more, so longestCommonPrefix :: Eq a => [[a]] -> [a] longestCommonPrefix [] = [] -- what else? longestCommonPrefix lists = foldr1 commonPrefix lists -- to get the common prefix of two lists, we use explicit recursion, I don't see an elegant way to avoid that. commonPrefix :: Eq a => [a] -> [a] -> [a] commonPrefix (x:xs) (y:ys) | x == y = x : commonPrefix xs ys commonPrefix _ _ = [] produces the result incrementally.

I don't know for sure if the following is elegant or not, but: a = "aaabbbbbbcccccc" b = "aaabbbccccccd" c = "aaabbbbbbbbbccccffffdd" takeCommonPrefix xs = flip take xs. length . filter (`isPrefixOf`xs). drop 1. inits *Main> foldl1 takeCommomPrefix [a,b,c] "aaabbb" *Main> foldl1 takeCommonPrefix [a,b,c,""] "" May be the use of 'length' is a bit ugly. :-) Hallo Daniel Fischer, je schreef op 29-04-11 16:32:
-- to get the common prefix of two lists, we use explicit recursion, I don't see an elegant way to avoid that.
-- Met vriendelijke groet, =@@i

On Friday 29 April 2011 20:17:27, Aai wrote:
I don't know for sure if the following is elegant or not, but:
In a way, yes. It's a composition of library functions, so you delegate the explicit recursion to those. There are a few points where you can increase elegance and performance here, see inline comments. But it's very inefficient and the suggestions don't change that.
a = "aaabbbbbbcccccc" b = "aaabbbccccccd" c = "aaabbbbbbbbbccccffffdd"
takeCommonPrefix xs = flip take xs. length . filter (`isPrefixOf`xs) . drop 1. inits
Don't use filter here, as soon as you hit an initial segment which isn't a prefix of xs, no later one will be, so it's a job for takeWhile. Also, you already have the common prefixes, so why count and take from xs? Just take the longest: takeCommonPrefix xs = last . takeWhile (`isPrefixOf` xs) . inits
*Main> foldl1 takeCommomPrefix [a,b,c] "aaabbb"
*Main> foldl1 takeCommonPrefix [a,b,c,""] ""
May be the use of 'length' is a bit ugly. :-)
Yes, and unneeded. But perforance-wise, that is not what hurts. The trouble is, if xs and ys have a longest common prefix of length p, you check whether (take k ys) is a prefix of xs for k <- [0 .. p+1], so that is an O(p^2) algorithm. Also, you can't start delivering until the longest common prefix has been determined, which may cause high memory requirements and bad performance. Consider a = replicate 1000000 'a' b = replicate 1000001 'a' c = "abacab" foldl1 takeCommonPrefix [a,b,c] = takeCommonPrefix ab c where ab = takeCommonPrefix a b If the common prefix is delivered incrementally, all we need is the first two characters to determine the result. That's fast and needs very little memory. If the common prefix has to be entirely determined before it can be delivered, you need to have two one-million characters long Strings in memory at the same time, which needs a couple dozen MB, and to determine the longest common prefix of a and b by the above method, you need about 5*10^11 Char comparisons, so it'll need a bit of time too.
Hallo Daniel Fischer, je schreef op 29-04-11 16:32:
-- to get the common prefix of two lists, we use explicit recursion, I don't see an elegant way to avoid that.

On Apr 29, 2011, at 10:32 AM, Daniel Fischer wrote:
-- to get the common prefix of two lists, we use explicit recursion, I don't see an elegant way to avoid that.
commonPrefix :: Eq a => [a] -> [a] -> [a] commonPrefix (x:xs) (y:ys) | x == y = x : commonPrefix xs ys commonPrefix _ _ = []
produces the result incrementally.
Personally, that's the definition I'd prefer. It's simple, clear, and fast. If I were forced to hide the recursion, I would probably write this:
commonPrefix a b = map fst (takeWhile (uncurry (==)) (zip a b))
modulo usage of whatever combination of (.) and/or ($) is in style today. -- James

-- Below is a solution that doesn't use length. -- Documentation for each function and trial runs -- are shown in a comment block at the end. import Data.List.Ordered(isSortedBy) import Safe(headMay) import Maybe(fromJust) a = "ababaaaa" b = "ababcccc" c = "ab" d = "" safeTail :: [a] -> [a] safeTail [] = [] safeTail xs = tail xs maybeTranspose :: [[a]] -> [[Maybe a]] maybeTranspose [] = [] maybeTranspose xs | all null xs = [] | otherwise = map headMay xs : maybeTranspose (map safeTail xs) commonPrefix :: Eq a => [[a]] -> [a] commonPrefix = map fromJust . fmap (head . snd) . takeWhile fst . fmap (\a -> ((isSortedBy (==) a), a)) . maybeTranspose {- Trial runs:
commonPrefix [a,b] => "abab" commonPrefix [a,b,c] => "ab" commonPrefix [a,b,c,d] => ""
Functions: 'safeTail' only returns the tail of the list of a non-empty list or an empty list 'maybeTranspose' is just like List.transpose except it returns the biggest matrix, not the smallest eg. transpose [[1,2],[],[4,5]] => [[1,4],[2,5]] maybeTranspose [[1,2][][4,5]] => [[Just 1,Nothing,Just 4], [Just 2,Nothing,Just 5], [Nothing,Nothing,Nothing]] 'commonPrefix' transposes the input lists , extracts the first n lists in the transposition where all the elements are equal, extracts the element from each and returns a concatenation of them. eg. Given input ["ab", "abab", "aba"] 1. Transpose [[Just 'a',Just 'a',Just 'a'], [Just 'b',Just 'b',Just 'b'], [Nothing,Just 'a',Just 'a'], [Nothing,Just 'b',Nothing]] 2. Tag lists with equal elements [(True,[Just 'a',Just 'a',Just 'a']), (True,[Just 'b',Just 'b',Just 'b']), (False,[Nothing,Just 'a',Just 'a']), (False,[Nothing,Just 'b',Nothing])] 3. Take the first n lists with equal elements [(True,[Just 'a',Just 'a',Just 'a']), (True,[Just 'b',Just 'b',Just 'b'])] 4. Grab the equal element from each [Just 'a',Just 'b'] 5. Concatenate "ab" I didn't like using 'fromJust' but I don't see how it can throw an exception in this case. Working with the (Bool, [Maybe a]) tuples directly could probably have been avoided by using Arrows, but it wasn't obvious to me. -}
participants (6)
-
Aai
-
aditya siram
-
Christopher Done
-
Daniel Fischer
-
James Cook
-
philipp siegmantel