A better way?

Keith Sheppard

21 Feb 2009 21 Feb '09

9:35 p.m.

Hello, I'm new to haskell and still getting used to working with lazy evaluation. I created a little function to calculate column widths for a 2D list of strings (a table) by iterating through the list and accumulating a max value for each column. My initial implementation ran out memory for tables with many rows because of lazy evaluation and here is how I dealt with it: {- | for a table, calculate the max width in characters for each column -} maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths [] = [] maxTableColumnWidths table = maxTableColumnWidthsInternal table [] maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int] maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues maxTableColumnWidthsInternal (row:tableTail) prevMaxValues | seqList prevMaxValues = undefined | otherwise = maxTableColumnWidthsInternal tableTail (maxRowFieldWidths row prevMaxValues) -- this little function is for making the list strict... otherwise -- we run out of memory seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail maxRowFieldWidths :: [String] -> [Int] -> [Int] maxRowFieldWidths row prevMaxValues = let colLengths = map length row lengthOfRow = length row lengthOfPrevMax = length prevMaxValues maxPrefixList = zipWith max colLengths prevMaxValues in if lengthOfRow == lengthOfPrevMax then maxPrefixList else if lengthOfRow > lengthOfPrevMax then maxPrefixList ++ (drop lengthOfPrevMax colLengths) else maxPrefixList ++ (drop lengthOfRow prevMaxValues) This works but it isn't very pretty (maybe also inefficient?). Is there a better way to deal with this kind of memory issue? Thanks! Keith

Show replies by date

Thomas Davie

21 Feb 21 Feb

10:29 p.m.

How about this: maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths = map (maximum . map length) Bob On 21 Feb 2009, at 22:35, Keith Sheppard wrote:

...

Hello,

I'm new to haskell and still getting used to working with lazy evaluation. I created a little function to calculate column widths for a 2D list of strings (a table) by iterating through the list and accumulating a max value for each column. My initial implementation ran out memory for tables with many rows because of lazy evaluation and here is how I dealt with it:

{- | for a table, calculate the max width in characters for each column -} maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths [] = [] maxTableColumnWidths table = maxTableColumnWidthsInternal table []

maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int] maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues maxTableColumnWidthsInternal (row:tableTail) prevMaxValues | seqList prevMaxValues = undefined | otherwise = maxTableColumnWidthsInternal tableTail (maxRowFieldWidths row prevMaxValues)

-- this little function is for making the list strict... otherwise -- we run out of memory seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

maxRowFieldWidths :: [String] -> [Int] -> [Int] maxRowFieldWidths row prevMaxValues = let colLengths = map length row lengthOfRow = length row lengthOfPrevMax = length prevMaxValues maxPrefixList = zipWith max colLengths prevMaxValues in if lengthOfRow == lengthOfPrevMax then maxPrefixList else if lengthOfRow > lengthOfPrevMax then maxPrefixList ++ (drop lengthOfPrevMax colLengths) else maxPrefixList ++ (drop lengthOfRow prevMaxValues)

This works but it isn't very pretty (maybe also inefficient?). Is there a better way to deal with this kind of memory issue?

Thanks! Keith _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Keith Sheppard

10:53 p.m.

Thank you for the reply, but it looks like this is a different function. Here is the interactive output of my function vs. the one you give (my code is in Table/IO.hs): Prelude> :load Table.IO *Table.IO> maxTableColumnWidths (replicate 10 ["hello", "world"]) [5,5] *Table.IO> let maxTableColumnWidths2 = map (maximum . map length) *Table.IO> maxTableColumnWidths2 (replicate 10 ["hello", "world"]) [5,5,5,5,5,5,5,5,5,5] -Keith On Sat, Feb 21, 2009 at 5:29 PM, Thomas Davie wrote:

...

How about this:

maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths = map (maximum . map length)

Bob

On 21 Feb 2009, at 22:35, Keith Sheppard wrote:

...
Hello,

I'm new to haskell and still getting used to working with lazy evaluation. I created a little function to calculate column widths for a 2D list of strings (a table) by iterating through the list and accumulating a max value for each column. My initial implementation ran out memory for tables with many rows because of lazy evaluation and here is how I dealt with it:

{- | for a table, calculate the max width in characters for each column -} maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths [] = [] maxTableColumnWidths table = maxTableColumnWidthsInternal table []

maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int] maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues maxTableColumnWidthsInternal (row:tableTail) prevMaxValues | seqList prevMaxValues = undefined | otherwise = maxTableColumnWidthsInternal tableTail (maxRowFieldWidths row prevMaxValues)

-- this little function is for making the list strict... otherwise -- we run out of memory seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

maxRowFieldWidths :: [String] -> [Int] -> [Int] maxRowFieldWidths row prevMaxValues = let colLengths = map length row lengthOfRow = length row lengthOfPrevMax = length prevMaxValues maxPrefixList = zipWith max colLengths prevMaxValues in if lengthOfRow == lengthOfPrevMax then maxPrefixList else if lengthOfRow > lengthOfPrevMax then maxPrefixList ++ (drop lengthOfPrevMax colLengths) else maxPrefixList ++ (drop lengthOfRow prevMaxValues)

This works but it isn't very pretty (maybe also inefficient?). Is there a better way to deal with this kind of memory issue?

Thanks! Keith _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Daniel Fischer

10:58 p.m.

Am Samstag, 21. Februar 2009 23:29 schrieb Thomas Davie:

...

How about this:

maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths = map (maximum . map length)

That's not what he needs, maxTableColumnWidths = map (maximum . map length) . transpose would be it. But I'm afraid that wouldn't solve his memory problems. Regarding the memory problems: String is rather a memory hog anyway. Keith, have you considered using ByteStrings? That might solve your memory problems with a straightforward algorithm. Also, if the rows can contain many columns, it is wasteful to calculate the length of prevMaxValues for every row. You could either have that as a parameter, or use a custom zipWith: zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs Then maxRowFieldWidths would become maxRowFieldWidths row prev = zipWithD max (map length row) prev or, pointfree: maxRowFieldWidths = zipWithD max . map length seqList can also be written as seqList = foldr seq False That would make maxTableColumnWidths = foldr ((seqList .) . zipWithD max) [] I'm not sure if that is strict enough, though. Also, you might try maxTCWs = foldr seq [] . foldr (zipWithD max) [] , but I expect that to be a bad memory citizen.

...

Bob

Cheers, Daniel

Daniel Fischer

11:16 p.m.

Am Samstag, 21. Februar 2009 23:58 schrieb Daniel Fischer:

...

Am Samstag, 21. Februar 2009 23:29 schrieb Thomas Davie:

...
How about this:

maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths = map (maximum . map length)

That's not what he needs,

maxTableColumnWidths = map (maximum . map length) . transpose

would be it. But I'm afraid that wouldn't solve his memory problems.

Regarding the memory problems: String is rather a memory hog anyway. Keith, have you considered using ByteStrings? That might solve your memory problems with a straightforward algorithm.

Also, if the rows can contain many columns, it is wasteful to calculate the length of prevMaxValues for every row. You could either have that as a parameter, or use a custom zipWith:

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs

Then maxRowFieldWidths would become

maxRowFieldWidths row prev = zipWithD max (map length row) prev

or, pointfree:

maxRowFieldWidths = zipWithD max . map length

seqList can also be written as

seqList = foldr seq False

That would make

maxTableColumnWidths = foldr ((seqList .) . zipWithD max) []

Ouch! Of course not seqList, but an analogous function that returns the list itself. evalList xs | seqList xs = undefined | otherwise = xs maxTableColumnWidths = foldr ((evalList .) . zipWithD max) []

...

I'm not sure if that is strict enough, though.

Also, you might try

maxTCWs = foldr seq [] . foldr (zipWithD max) []

Oops! No good either. I must be too tired :( maxTCWs = evalList . foldr (zipWithD max) []

...

, but I expect that to be a bad memory citizen.

...
Bob

Cheers, Daniel

Daniel Fischer

11:21 p.m.

Am Sonntag, 22. Februar 2009 00:16 schrieb Daniel Fischer:

...

maxTableColumnWidths = foldr ((evalList .) . zipWithD max) []

Oops again: maxTableColumnWidths = foldr ((evalList .) . zipWithD max) [] . map (map length)

...

maxTCWs = evalList . foldr (zipWithD max) []

maxTCWs = evalList . foldr (zipWithD max) [] . map (map length)

...

...
Cheers, Daniel

Keith Sheppard

22 Feb 22 Feb

12:32 a.m.

ghci still is not happy if we have many rows... Prelude> :load Table.IO [1 of 1] Compiling Table.IO ( Table/IO.hs, interpreted ) Ok, modules loaded: Table.IO. *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD max) [] . map (map length) *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map length) *Table.IO> maxTableColumnWidths (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow *Table.IO> maxTCWs (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow I hadn't thought of using ByteStrings since I don't know what they are :-). I'll have to look into it, but I'm assuming that ByteStrings will give some constant time/space improvement? I think it won't help with my first problem though since what's happening is that the lazy function calls are piling up too deep (at least thats what I think is happening). Thank you Keith On Sat, Feb 21, 2009 at 6:21 PM, Daniel Fischer wrote:

...

Am Sonntag, 22. Februar 2009 00:16 schrieb Daniel Fischer:

...
maxTableColumnWidths = foldr ((evalList .) . zipWithD max) []

Oops again:

maxTableColumnWidths = foldr ((evalList .) . zipWithD max) [] . map (map length)

...
...
maxTCWs = evalList . foldr (zipWithD max) []

maxTCWs = evalList . foldr (zipWithD max) [] . map (map length)

...
...
Cheers, Daniel

Daniel Fischer

1:03 a.m.

Am Sonntag, 22. Februar 2009 01:32 schrieb Keith Sheppard:

...

ghci still is not happy if we have many rows...

Prelude> :load Table.IO [1 of 1] Compiling Table.IO ( Table/IO.hs, interpreted ) Ok, modules loaded: Table.IO. *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD max) [] . map (map length) *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map length) *Table.IO> maxTableColumnWidths (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow

Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the end) and it works for large tables. If the rows are short, it actually is faster (here) than your version, but if the rows are long , e.g. maxTableColumnWidths (replicate 2000 (replicate 1000 "what?")) your version is faster than import Data.List (foldl') seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail evalList xs | seqList xs = undefined | otherwise = xs zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs maxTableColumnWidths = foldl' ((evalList .) . zipWithD max) [] . map (map length)

...

*Table.IO> maxTCWs (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow

That doesn't surprise me in the least. One has to force the list in each step.

...

I hadn't thought of using ByteStrings since I don't know what they are

ByteStrings are basically byte arrays. Thus if you're dealing exclusively with characters in the range 0-255, you need only one byte per character, while with Strings, you need four bytes for each character itself, plus several machine words for pointers (list cell to Char, list cell to next), I don't remember, but I think it amounts to 12 or 20 bytes per character.

...

:-). I'll have to look into it, but I'm assuming that ByteStrings will

give some constant time/space improvement?

Can be several orders of magnitude faster, and as said above, they use far less memory (but they're useful only for long enough strings, having a multitude of short ByteStrings floating around doesn't do any good).

...

I think it won't help with my first problem though since what's happening is that the lazy function calls are piling up too deep (at least thats what I think is happening).

Yes, that's basically what *** Exception: stack overflow means :)

...

Thank you Keith

Cheers, Daniel

Patrick LeBoutillier

2:32 a.m.

Hi, I changed my code to : maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths = foldl' rowMax zeros where rowMax = zipWith (\m f -> max (length f) m) zeros = 0 : zeros but it still blows the stack. I don't understand why. Doesn't foldl' force the evaluation of each call to rowMax? If so then I don't see what causes the stack to get so big... unless I'm looking in the wrong place... Can anyone shed some light? Patrick On Sat, Feb 21, 2009 at 8:03 PM, Daniel Fischer wrote:

...

Am Sonntag, 22. Februar 2009 01:32 schrieb Keith Sheppard:

...
ghci still is not happy if we have many rows...

Prelude> :load Table.IO [1 of 1] Compiling Table.IO ( Table/IO.hs, interpreted ) Ok, modules loaded: Table.IO. *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD max) [] . map (map length) *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map length) *Table.IO> maxTableColumnWidths (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow

Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the end) and it works for large tables. If the rows are short, it actually is faster (here) than your version, but if the rows are long , e.g.

maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))

your version is faster than

import Data.List (foldl')

seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

evalList xs | seqList xs = undefined | otherwise = xs

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs

maxTableColumnWidths = foldl' ((evalList .) . zipWithD max) [] . map (map length)

...
*Table.IO> maxTCWs (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow

That doesn't surprise me in the least. One has to force the list in each step.

...
I hadn't thought of using ByteStrings since I don't know what they are

ByteStrings are basically byte arrays. Thus if you're dealing exclusively with characters in the range 0-255, you need only one byte per character, while with Strings, you need four bytes for each character itself, plus several machine words for pointers (list cell to Char, list cell to next), I don't remember, but I think it amounts to 12 or 20 bytes per character.

...
:-). I'll have to look into it, but I'm assuming that ByteStrings will

give some constant time/space improvement?

Can be several orders of magnitude faster, and as said above, they use far less memory (but they're useful only for long enough strings, having a multitude of short ByteStrings floating around doesn't do any good).

...
I think it won't help with my first problem though since what's happening is that the lazy function calls are piling up too deep (at least thats what I think is happening).

Yes, that's basically what *** Exception: stack overflow means :)

...
Thank you Keith

Cheers, Daniel

_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Heinrich Apfelmus

2:34 p.m.

Patrick LeBoutillier wrote:

...

Hi,

I changed my code to :

maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths = foldl' rowMax zeros where rowMax = zipWith (\m f -> max (length f) m) zeros = 0 : zeros

but it still blows the stack. I don't understand why. Doesn't foldl' force the evaluation of each call to rowMax? If so then I don't see what causes the stack to get so big... unless I'm looking in the wrong place...

Can anyone shed some light?

It does force the evaluation of each call to rowMax but only to weak head normal form. In other words, it will only evaluate the list to either a cons cell or an empty list blah ---> blah : blah \-> [] See also http://en.wikibooks.org/wiki/Haskell/Graph_reduction Regards, apfelmus -- http://apfelmus.nfshost.com

Keith Sheppard

2:38 a.m.

Thanks, this code looks cleaner than what I had. I like how zipWithD fixes the whole issue of uneven rows... -Keith On Sat, Feb 21, 2009 at 8:03 PM, Daniel Fischer wrote:

...

Am Sonntag, 22. Februar 2009 01:32 schrieb Keith Sheppard:

...
ghci still is not happy if we have many rows...

Prelude> :load Table.IO [1 of 1] Compiling Table.IO ( Table/IO.hs, interpreted ) Ok, modules loaded: Table.IO. *Table.IO> let maxTableColumnWidths = foldr ((evalList .) . zipWithD max) [] . map (map length) *Table.IO> let maxTCWs = evalList . foldr (zipWithD max) [] . map (map length) *Table.IO> maxTableColumnWidths (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow

Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the end) and it works for large tables. If the rows are short, it actually is faster (here) than your version, but if the rows are long , e.g.

maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))

your version is faster than

import Data.List (foldl')

seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

evalList xs | seqList xs = undefined | otherwise = xs

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs

maxTableColumnWidths = foldl' ((evalList .) . zipWithD max) [] . map (map length)

...
*Table.IO> maxTCWs (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow

That doesn't surprise me in the least. One has to force the list in each step.

...
I hadn't thought of using ByteStrings since I don't know what they are

ByteStrings are basically byte arrays. Thus if you're dealing exclusively with characters in the range 0-255, you need only one byte per character, while with Strings, you need four bytes for each character itself, plus several machine words for pointers (list cell to Char, list cell to next), I don't remember, but I think it amounts to 12 or 20 bytes per character.

...
:-). I'll have to look into it, but I'm assuming that ByteStrings will

give some constant time/space improvement?

Can be several orders of magnitude faster, and as said above, they use far less memory (but they're useful only for long enough strings, having a multitude of short ByteStrings floating around doesn't do any good).

...
I think it won't help with my first problem though since what's happening is that the lazy function calls are piling up too deep (at least thats what I think is happening).

Yes, that's basically what *** Exception: stack overflow means :)

...
Thank you Keith

Cheers, Daniel

Heinrich Apfelmus

2:30 p.m.

Daniel Fischer wrote:

...

Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the end) and it works for large tables. If the rows are short, it actually is faster (here) than your version, but if the rows are long , e.g.

maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))

your version is faster than

import Data.List (foldl')

seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

evalList xs | seqList xs = undefined | otherwise = xs

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs

maxTableColumnWidths = foldl' ((evalList .) . zipWithD max) [] . map (map length)

Nice! I'd use bang patterns in favor of the now outdated | x `seq` False = undefined pattern though. Actually, I'd use import Control.Parallel.Strategies maxWidths = foldl' (\xs ys -> zipWithD max xs ys `using` rnf) [] . map (map length) The module quite useful for controlling evaluation, even when no parallelism is involved. Regards, apfelmus -- http://apfelmus.nfshost.com

Daniel Fischer

3:17 p.m.

Am Sonntag, 22. Februar 2009 15:30 schrieb Heinrich Apfelmus:

...

Daniel Fischer wrote:

...
Yes, foldr was the wrong choice. make it foldl' (don't forget the prime at the end) and it works for large tables. If the rows are short, it actually is faster (here) than your version, but if the rows are long , e.g.

maxTableColumnWidths (replicate 2000 (replicate 1000 "what?"))

your version is faster than

import Data.List (foldl')

seqList [] = False seqList (head:tail)

| head `seq` False = undefined | otherwise = seqList tail

evalList xs

| seqList xs = undefined | otherwise = xs

zipWithD :: (a -> a -> a) -> [a] -> [a] -> [a] zipWithD f (x:xt) (y:yt) = f x y:zipWithD f xt yt zipWithD _ [] ys = ys zipWithD _ xs [] = xs

maxTableColumnWidths = foldl' ((evalList .) . zipWithD max) [] . map (map length)

Nice! I'd use bang patterns in favor of the now outdated

| x `seq` False = undefined

pattern though.

I would, too, but they're not yet portable, are they? Which implementations other than GHC currently support them? And since seqList was already there, I used that.

...

Actually, I'd use

import Control.Parallel.Strategies

maxWidths = foldl' (\xs ys -> zipWithD max xs ys `using` rnf) [] . map (map length)

The module quite useful for controlling evaluation, even when no parallelism is involved.

Winner!

...

Regards, apfelmus

Cheers, Daniel

Heinrich Apfelmus

23 Feb 23 Feb

10:54 a.m.

Daniel Fischer wrote:

...

...
I'd use bang patterns in favor of the now outdated

| x `seq` False = undefined

pattern though.

I would, too, but they're not yet portable, are they? Which implementations other than GHC currently support them?

No, they're not portable yet, but they're really convenient. :D The implementations other than GHC will probably pick up -XBangPatterns as well. Regards, apfelmus -- http://apfelmus.nfshost.com

Patrick LeBoutillier

21 Feb 21 Feb

10:54 p.m.

Hi, I came up with this, which seems to work well if all your rows have the same number of fields: maxRowFieldWidths :: [[String]] -> [Int] maxRowFieldWidths rows = foldr rowMax zeros rows where rowMax fields maxes = zipWith (\f m -> max (length f) m) fields maxes zeros = 0 : zeros foldr is the key here, which allows you to "accumulate" the results of computations. In this case the accumulator is the list of all the maximums found so far. The initial value for the accumulator is an (infinite) list made up on only zeros. Haskell will generate only as much as it needs. Patrick On Sat, Feb 21, 2009 at 4:35 PM, Keith Sheppard wrote:

...

Hello,

I'm new to haskell and still getting used to working with lazy evaluation. I created a little function to calculate column widths for a 2D list of strings (a table) by iterating through the list and accumulating a max value for each column. My initial implementation ran out memory for tables with many rows because of lazy evaluation and here is how I dealt with it:

{- | for a table, calculate the max width in characters for each column -} maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths [] = [] maxTableColumnWidths table = maxTableColumnWidthsInternal table []

maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int] maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues maxTableColumnWidthsInternal (row:tableTail) prevMaxValues | seqList prevMaxValues = undefined | otherwise = maxTableColumnWidthsInternal tableTail (maxRowFieldWidths row prevMaxValues)

-- this little function is for making the list strict... otherwise -- we run out of memory seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

maxRowFieldWidths :: [String] -> [Int] -> [Int] maxRowFieldWidths row prevMaxValues = let colLengths = map length row lengthOfRow = length row lengthOfPrevMax = length prevMaxValues maxPrefixList = zipWith max colLengths prevMaxValues in if lengthOfRow == lengthOfPrevMax then maxPrefixList else if lengthOfRow > lengthOfPrevMax then maxPrefixList ++ (drop lengthOfPrevMax colLengths) else maxPrefixList ++ (drop lengthOfRow prevMaxValues)

This works but it isn't very pretty (maybe also inefficient?). Is there a better way to deal with this kind of memory issue?

Thanks! Keith _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

Keith Sheppard

11:52 p.m.

Hello Patrick, I renamed your function maxTableColumnWidths to match up with my naming. It seems to work except that it suffers from the same problem as my 1st implementation when you feed it a table that has many rows: *Table.IO> maxTableColumnWidths (replicate 1000000 ["hello", "world"]) *** Exception: stack overflow Thanks, Keith On Sat, Feb 21, 2009 at 5:54 PM, Patrick LeBoutillier wrote:

...

Hi,

I came up with this, which seems to work well if all your rows have the same number of fields:

maxRowFieldWidths :: [[String]] -> [Int] maxRowFieldWidths rows = foldr rowMax zeros rows where rowMax fields maxes = zipWith (\f m -> max (length f) m) fields maxes zeros = 0 : zeros

foldr is the key here, which allows you to "accumulate" the results of computations. In this case the accumulator is the list of all the maximums found so far. The initial value for the accumulator is an (infinite) list made up on only zeros. Haskell will generate only as much as it needs.

Patrick

On Sat, Feb 21, 2009 at 4:35 PM, Keith Sheppard wrote:

...
Hello,

I'm new to haskell and still getting used to working with lazy evaluation. I created a little function to calculate column widths for a 2D list of strings (a table) by iterating through the list and accumulating a max value for each column. My initial implementation ran out memory for tables with many rows because of lazy evaluation and here is how I dealt with it:

{- | for a table, calculate the max width in characters for each column -} maxTableColumnWidths :: [[String]] -> [Int] maxTableColumnWidths [] = [] maxTableColumnWidths table = maxTableColumnWidthsInternal table []

maxTableColumnWidthsInternal :: [[String]] -> [Int] -> [Int] maxTableColumnWidthsInternal [] prevMaxValues = prevMaxValues maxTableColumnWidthsInternal (row:tableTail) prevMaxValues | seqList prevMaxValues = undefined | otherwise = maxTableColumnWidthsInternal tableTail (maxRowFieldWidths row prevMaxValues)

-- this little function is for making the list strict... otherwise -- we run out of memory seqList [] = False seqList (head:tail) | head `seq` False = undefined | otherwise = seqList tail

maxRowFieldWidths :: [String] -> [Int] -> [Int] maxRowFieldWidths row prevMaxValues = let colLengths = map length row lengthOfRow = length row lengthOfPrevMax = length prevMaxValues maxPrefixList = zipWith max colLengths prevMaxValues in if lengthOfRow == lengthOfPrevMax then maxPrefixList else if lengthOfRow > lengthOfPrevMax then maxPrefixList ++ (drop lengthOfPrevMax colLengths) else maxPrefixList ++ (drop lengthOfRow prevMaxValues)

This works but it isn't very pretty (maybe also inefficient?). Is there a better way to deal with this kind of memory issue?

Thanks! Keith _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

-- ===================== Patrick LeBoutillier Rosemère, Québec, Canada

5983

Age (days ago)

5985

Last active (days ago)

List overview

Download

15 comments

5 participants

participants (5)

Daniel Fischer
Heinrich Apfelmus
Keith Sheppard
Patrick LeBoutillier
Thomas Davie

A better way?

Patrick LeBoutillier

Patrick LeBoutillier

tags

participants (5)