
Peter Green wrote:
Heinrich, thanks for some great help and food for thought!
My pleasure. :)
Thanks for describing the background of this problem in detail! I was mainly asking because I'm always looking for interesting Haskell topics that can be turned into a tutorial of sorts, and this problem makes a great example.
Another interesting problem is starting from a file of single wagers and trying to compress them (i.e. inverse of 'explosion'. I believe this problem is analogous to Set Cover and therefore NP-Complete. Heuristics are the order of the day here.
Interesting indeed! What kind of heuristics do you employ there? It seems quite difficult to me, somehow.
[...] main = interact (unlines . toCSV . sortExplode . fromCSV . lines)
Thank you *very* much for this code! I will try to get my head around it. I understand the broad outline you posted elsewhere, but will take me a while to fully grasp the above as I'm only up to ~p200 in Real World Haskell :).
As for performance of your code above on my file of compressed wagers which expands to 3.7M single wagers:
(My original version posted here and using vanilla sort) 541 MB total memory in use (5 MB lost due to fragmentation) INIT time 0.00s ( 0.01s elapsed) MUT time 13.98s ( 15.82s elapsed) GC time 8.69s ( 9.64s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 22.67s ( 25.47s elapsed)
(Your much improved version) 10 MB total memory in use (1 MB lost due to fragmentation) INIT time 0.00s ( 0.00s elapsed) MUT time 7.61s ( 9.38s elapsed) GC time 3.48s ( 3.58s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 11.08s ( 12.95s elapsed)
Very impressive and thanks again!
Category theory saves the day. ;) The code is based on three observations: 1) Sorting and exploding can be interleaved to create version that can stream results in an on-line fashion. 2) There is a standard formulation of this pattern in terms of folds and unfolds (= catamorphisms and anamorphisms). 3) Lazy evaluation can be used to make this look like an off-line algorithm. To my surprise, I don't have good references for these, but maybe the following can help a bit. Something similar to 1) can be found in Graham Hutton. The countdown problem. http://www.cs.nott.ac.uk/~gmh/bib.html#countdown and Richard Bird. A program to solve Sudoku. http://www.cs.tufts.edu/~nr/comp150fp/archive/richard-bird/sudoku.pdf The standard reference for 2) is Meijer et al. Functional programming with bananas, lenses, envelopes and barbed wire. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.125 and 3) is folklore, maybe John Hughes. Why functional programming matters. http://www.cs.chalmers.se/~rjmh/Papers/whyfp.html could serve as an introduction to lazy evaluation.
Note that there's also the possibility of not expanding the compressed wagers at all, and perform set operations directly. For instance, it is straightforward to intersect two such sets of size n in O(n^2) time. Since n ~ 5000 , n^2 is about the same ballpark as the exploded single wager combinations.
Not sure I quite understand you here. In my mind, set elements *are* single combinations. It is possible for two quite different-looking files of compressed wagers to contain exactly the same single wager elements. So I'm not sure how to compare without some notion of explosion to single combinations.
What I mean is that there are two ways to represent sets of wagers: * A list of single combinations -> [Row a] * A list of compressed wagers -> [Row [a]] To perform set operations, you first convert the latter into the former. But the idea is that there might be a way of performing set operations without doing any conversion. Namely, the compressed wagers are cartesian products which behave well under intersection: the intersection of two compressed wagers is again a compressed wager intersectC :: Eq a => Row [a] -> Row [a] -> Row [a] intersectC [] [] = [] intersectC (xs:xrow) (ys:yrow) = intersect xs ys : intersectC xrow yrow In formulas: intersect (cartesian x) (cartesian y) = cartesian (intersectC x y) for compressed wagers x, y :: Row [a] with intersect = list intersection, for instance Data.List.intersect cartesian = converts a compressed wager to a list of single combinations This allows you to intersect two lists of compressed wagers directly intersect' :: [Row [a]] -> [Row [a]] -> [Row [a]] intersect' xs ys = filter (not . null) $ [intersectC x y | x<-xs, y<-ys] Unfortunately, the result will have quadratic size; but if you have significantly less compressed wagers than single combinations, this may be worth a try. Not to mention that you might be able to compress the result again. Mathematically, this exploits the equations (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D) "The intersection of two rectangles is another rectangle" and (A ∪ B) ∩ (C ∪ D) = (A ∩ C) ∪ (A ∩ D) ∪ (B ∩ C) ∪ (B ∩ D) where A,B,C,D are sets. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com