
Compare these two lines in ghci (if you have at least 4 GB of memory):
let a = replicate (2^26) 2 in minimum [[1],a,a] let a = replicate (2^26) 2 in minimum [a,a,[1]]
The first finishes much faster than the second. This comes up for example in isomorphism testing of graphs embedded in surfaces, which is much easier than the general case: A choice of a directed edge determines a walk of the vertices and edges of the graph, from which the graph can be recovered. A minimal walk can be used to determine a canonical form for the graph, making for a nice Haskell one-liner:
normal graph = unwalk $ minimum $ map (walk graph) $ edges graph
However, this code suffers from the same issue as the toy ghci lines above. Even though the walks are lazy, the minimum function wastefully explores them. It's clear how to fix this, comparing lazy lists: Play rounds of "Survivor", peeling off the heads of the lists at each iteration. One inevitably evaluates all of each minimal list, if there is more than one. This is familiar. For my example, the automorphisms of a graph show up in the complexity of any isomorphism algorithm; they manifest themselves here as the set of minimal walks. What I'm wondering, however, is if there is a way to code "minimum" efficiently in general,
minimum :: Ord a => [a] -> a
where one knows absolutely nothing further about the type "a", but one believes that lazy evaluation will run afoul of the above issue. It would seem that this would require compiler support, allowing code to access approximations to lazy data generalizing the head of a lazy list. I'm reminded of working with power series by working mod x^n for various n. Here, I'd like a bounded version of "compare", that returned EQ for data that agreed to a specified lazy evaluation depth. Did I miss that class? Is there a construct in GHC that would allow me to write "minimum" efficiently for lazy data?

On Wed, Nov 19, 2008 at 8:06 PM, Dave Bayer
What I'm wondering, however, is if there is a way to code "minimum" efficiently in general,
minimum :: Ord a => [a] -> a
where one knows absolutely nothing further about the type "a", but one believes that lazy evaluation will run afoul of the above issue.
It would seem that this would require compiler support, allowing code to access approximations to lazy data generalizing the head of a lazy list. I'm reminded of working with power series by working mod x^n for various n. Here, I'd like a bounded version of "compare", that returned EQ for data that agreed to a specified lazy evaluation depth.
Did I miss that class? Is there a construct in GHC that would allow me to write "minimum" efficiently for lazy data?
One possibility would be to add minimum and maximum to Ord with the
appropriate default definitions, similar to Monoid's mconcat.
--
Dave Menendez

On Wednesday 19 November 2008 11:38:07 pm David Menendez wrote:
One possibility would be to add minimum and maximum to Ord with the appropriate default definitions, similar to Monoid's mconcat.
This is probably the most sensible way. However, first seeing this, I wanted to see if I could do it with RULES, like so: -- this of course fails for Ords where a <= b, b <=a, but a and b are -- somehow distinguishable. min' :: Ord a => [a] -> [a] -> [a] min' as@(a:at) bs@(b:bt) | a < b = as | b < a = bs | otherwise = a : min' at bt -- arbitrary choice here minimum' :: Ord a => [[a]] -> [a] minimum' = foldl1 min' {-# RULES "min-list" min = min' #-} {-# RULES "minimum-list" minimum = minimum' #-} However, trying this gives me complaints about how Ord a cannot be inferred from Ord [a], so min and minimum aren't providing the right information for min' and minimum'. Perhaps it's unreasonable to expect this to work? Anyhow, barring that problem, that's probably the best hope one has of inserting a fancy procedure for lazy data without reworking the core libraries. This also has the issue of not really solving the problem all the way down. For instance, I think: let a = replicate (2^20) 2 in minimum [[[a]], [[a]], [[1]]] still shows bad behavior. So you need an algorithm more clever than the one I've come up with. :) -- Dan

On Thu, Nov 20, 2008 at 12:18 AM, Dan Doel
On Wednesday 19 November 2008 11:38:07 pm David Menendez wrote:
One possibility would be to add minimum and maximum to Ord with the appropriate default definitions, similar to Monoid's mconcat.
This is probably the most sensible way.
Now that I've thought about it more, the problem is that min and max are insufficiently lazy for lists. Here's a list clone that has the desired properties: data List a = Nil | a :< List a deriving (Show, Eq) instance Ord a => Ord (List a) where compare Nil Nil = EQ compare Nil _ = LT compare _ Nil = GT compare (a :< as) (b :< bs) = case compare a b of LT -> LT EQ -> compare as bs GT -> GT min (a :< as) (b :< bs) = case compare a b of LT -> a :< as EQ -> a :< min as bs GT -> b :< bs min _ _ = Nil head' (a :< as) = a head' _ = error "head': empty List" Thus, *Main> head' $ min (2 :< undefined) (2 :< undefined) 2 *Main> minimum [2 :< undefined, 2 :< undefined, 1 :< Nil] 1 :< Nil The tricky part is that derived instances of Ord do not make min (and max) lazy in this way, so you have to write your own. There are also possible space issues, since "min a b" will end up re-creating much of "a" or "b" if they share a large common prefix. <snip>
This also has the issue of not really solving the problem all the way down. For instance, I think:
let a = replicate (2^20) 2 in minimum [[[a]], [[a]], [[1]]]
still shows bad behavior. So you need an algorithm more clever than the one I've come up with. :)
Yeah, my solution falls down there, too.
*Main> minimum [(2 :< undefined) :< Nil, (2 :< undefined) :< Nil, (1
:< Nil) :< Nil]
*** Exception: Prelude.undefined
I don't know if there's a good, general solution here.
--
Dave Menendez

On Nov 20, 2008, at 1:27 AM, David Menendez wrote:
There are also possible space issues, since "min a b" will end up re- creating much of "a" or "b" if they share a large common prefix.
Thanks! I had put off writing the version I needed until seeing your code, which inspired me:
Compute the least list in a list of equal length lazy lists
least :: forall a. Ord a => [[a]] -> [a]
least ((x : xs) : xss) = w : least wss where (w, wss) = foldl' f (x, [xs]) xss
f :: (a, [[a]]) -> [a] -> (a, [[a]]) f v@(y, yss) (z : zs) = case compare y z of LT -> v EQ -> (y, zs : yss) GT -> (z, [zs]) f v _ = v
least _ = []
In my applications, there are usually many instances of the minimum, so peeling off a reference copy isn't all that wasteful. I just want to avoid evaluating everything else. So to summarize the responses, there's no GHC language support for introspecting lazy structures, allowing one to write a generic bounded compare that only evaluates lazy structures to a specified depth. One can however write a class, and solve this problem type-by-type with a common interface. I can see how providing language support for this could be controversial. Functions would remain referentially transparent within a given compilation, but their values would depend non-portably on the compiler. (I use a "code is indented, comments are flush" literate preprocessor (http://hackage.haskell.org/trac/ghc/ticket/2776 ) so I don't have to look at punctuation for either code or comments; syntax coloring makes it obvious which is which. The 1980's email quote symbols > in the default literate Haskell reminds me of being introduced to an IBM 360 in college, and being told I was working with virtual card images. I gasped. I remember punched cards, and it isn't an experience I want to relive. I also remember 1980's email, so my first reaction to literate Haskell was "You've got to be kidding!" Ahh, but better to propose a fix than to gripe.)

Dave Bayer
So to summarize the responses, there's no GHC language support for introspecting lazy structures, allowing one to write a generic bounded compare that only evaluates lazy structures to a specified depth. One can however write a class, and solve this problem type-by-type with a common interface.
You might like how Lazy SmallCheck does it (then again, maybe not) http://www.cs.york.ac.uk/fp/smallcheck/smallcheck.pdf (Section 4.6) It might also help to use discriminators instead of Ord for comparison: http://portal.acm.org/citation.cfm?id=1411220 -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig 2008-11-20 Universal Children's Day http://unicef.org/ 1948-12-10 Universal Declaration of Human Rights http://everyhumanhasrights.org
participants (4)
-
Chung-chieh Shan
-
Dan Doel
-
Dave Bayer
-
David Menendez