HughesPJ vs. Wadler-Leijen

I've been trying to get my head around the Wadler-Leijen pretty printing combinators for a while now and having some trouble. Specifically, I have trouble getting them to pick optimal line breaks. The existing combinators like 'sep' (and everything built from it) merge all elements with <$> and then 'group' the whole thing, with the result that they either all go on one line, or get one line each. This is quite ugly for large lists of small elements. The other alternative is 'fillSep', which does a separate 'group' on each element. Unfortunately, it then tends to make very bad line wrapping decisions, e.g. you get: Rec { hi = "there" } Rec { hi = "there", hi = "there" , hi = "there" } Rec { lab = "short", label = [ 0, 1, 2 , 3, 4, 5 , 6, 7, 8 , 9, 10 , 11, 12 ] } No matter how much fancy 'group's and 'nest's and whatnot I threw in, it was always a choice between forcing wrapping on every element and looking ugly for many small elements, or trying to fit more into one line and having it wrap in the wrong place and wind up scrunched up on the right margin. Here's my latest attempt: list = commas PP.lbracket PP.rbracket . map format commas :: Doc -> Doc -> [Doc] -> Doc commas left right xs = PP.group $ left <+> punctuate (\x -> PP.group (x <$$> PP.comma <> PP.space)) xs <$> right punctuate :: (Doc -> Doc) -> [Doc] -> Doc punctuate f [] = mempty punctuate f [x] = x punctuate f (x:xs) = f x <> punctuate f xs record :: Doc -> [(String, Doc)] -> Doc record title fields = PP.group $ PP.hang 2 $ title <$> (commas PP.lbrace PP.rbrace (map f fields)) where f (label, field) = PP.hang 2 $ PP.group $ PP.text label <+> PP.equals <$> field But the thing is, the HughesPJ-using Language.Haskell.Pretty in haskell-src gets the line wrapping just right. So I investigated how it works, and it's very simple, here's the reduced version: class Pretty a where format :: a -> Doc list :: (Pretty a) => [a] -> Doc list = bracket_list . PP.punctuate PP.comma . map format fsep' :: [Doc] -> Doc fsep' [] = PP.empty fsep' (d:ds) = PP.nest 2 (PP.fsep (PP.nest (-2) d:ds)) bracket_list :: [Doc] -> Doc bracket_list = PP.brackets . PP.fsep brace_list :: [Doc] -> Doc brace_list = PP.braces . PP.fsep record :: Doc -> [(String, Doc)] -> Doc record title fields = title <> (brace_list (map field fields)) where field (name, val) = fsep' [PP.text name, PP.equals, val] ---- This formats records like so: Rec{hi = "there"} Rec{hi = "there" hi = "there" hi = "there"} Rec{label = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] label = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]} Rec{lab = "short" label = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]} Much better, even if it's not my preferred style! Of course WL doesn't have fsep or the negative nest craziness (I don't even know what it's doing there), but it has the more general 'group'. However, no matter how complicated I got with WL it just never came out right. In contrast, a very simple HughesPJ implementation gets it right. The thing is, when trying to figure out which pretty print library to use, the consensus is that WL is just all around better, even though HughesPJ is somewhat standard (but there are 6 WL variants on hackage, and no (?) HughesPJ ones). So am I just using it wrong? If I translate the HughesPJ one over directly into LW, here's what I get: Rec{hi = "there"} Rec{hi = "there", hi = "there", hi = "there"} Rec{label = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], label = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]} Rec{lab = "short", label = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]} So is WL really all it's cracked up to be? Am I using it wrong? I was going to suggest some consolidation in the pretty printing library packages, but now I'm not even sure which style should "win"...

A quick suggestion - does setting the ribbon_frac to something like 0.8 improve things? The Show instance for wl-pprint's Doc uses 0.4 which I've found too low. This means you'll have to write your own display function using `renderPretty`...

On 20 March 2012 20:24, Stephen Tetley
A quick suggestion - does setting the ribbon_frac to something like 0.8 improve things?
The Show instance for wl-pprint's Doc uses 0.4 which I've found too low.
This means you'll have to write your own display function using `renderPretty`...
I also found a few spacing/indentation related bugs in WL when I was writing wl-pprint-text; does it work better for you? -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Hi Ivan I haven't found any bugs in WL, however I do find the API somewhat confusing regarding line breaking (I would need to consult the manual to tell you the difference between linebreak, softline etc.). This is likely my failing rather than WL as usually I want formatting - "I know the layout" - rather than pretty printing - "the fit function finds the best layout". I think there is room in the design space for a library whose API "favors" formatting rather than pretty-printing. I.e it has line printing that cannot be undone by `group`, or the combinators that use group are given more long-winded makes to make them secondary. I've bits and bobs on the go to do this, but nothing near a concrete library.

Ahem - there was a severe typo in my last message. Usually I wouldn't spam the list to repair my failings but edit distance on the error in that message was so large it made no sense at all.
printing that cannot be undone by `group`, or the combinators that use group are given more long-winded **names** to make them secondary. I've bits and bobs on the go to do this, but nothing near a concrete library.
Apologies to all. (Funny how I can spot typos after the fact...)

On Tue, Mar 20, 2012 at 6:52 AM, Stephen Tetley
Hi Ivan
I haven't found any bugs in WL, however I do find the API somewhat confusing regarding line breaking (I would need to consult the manual to tell you the difference between linebreak, softline etc.). This is likely my failing rather than WL as usually I want formatting - "I know the layout" - rather than pretty printing - "the fit function finds the best layout".
Yeah, the 'group' combinator is at the center of it, but it took me some fiddling around to get a feel for how it worked... and I still don't have a feel for how it works when nested (as it is pervasively if you use fillSep and the like). Maybe it's elegantly minimal, but it doesn't seem to be that intuitive, unless someday I come to a realization that clears it all up. The thing is, I don't think there is a fit function that finds the best layout, I think it simply does what the composition of groups tells it to do.
I think there is room in the design space for a library whose API "favors" formatting rather than pretty-printing. I.e it has line printing that cannot be undone by `group`, or the combinators that use group are given more long-winded makes to make them secondary. I've bits and bobs on the go to do this, but nothing near a concrete library.
What I think I would like is some way to express a hierarchy of line breaks. So if I'm formatting a list, there's a break before/after each comma and they are all equally good breaks. But then if I nest and format a list of lists, the outer breaks are considered better breaks than the inner ones. This would preserve the hierarchical structure of the data by trying to break on the largest chunks first, and control indentation too. In fact, HughesPJ's fsep (or maybe it's the 'best' in renderStyle) seems to get that right all on its own. I also have a personal style that's hard to reconcile with the pprint combinators, namely that lists that fit on one line don't have spaces around the brackets: [1, 2, 3], but ones that must be wrapped do, and the close bracket lines up with the open one: [ 1, 2 , 3, 4 ] One other thing I've thought about is to have a pretty printer have the option of returning a list of Docs, in increasing detail. Then a smart viewer (perhaps HTML + JS) could let you expand things by clicking on them. Or maybe it would be more practical to just teach vim or emacs the output syntax and let folding take care of it.

On Tue, Mar 20, 2012 at 2:24 AM, Stephen Tetley
A quick suggestion - does setting the ribbon_frac to something like 0.8 improve things?
Nope. The ribbon (IMO both an undescriptive name and underdocumented) only constraints the number of non-indent characters per line. So it makes the line breaks in different places, but the underlying problem of it not knowing where lines should be broken remains.
The Show instance for wl-pprint's Doc uses 0.4 which I've found too low.
It's off the subject, but I alway thought 'ribbon' was odd as the single knob available. I never really saw a rationale for why it's so important. The old Hughes-PJ paper says it looks nice to have a "ribbon" of text snaking across the page, but I think it looks nice to preserve vertical space by filling lines as much as possible. Difference of opinion I guess.
participants (3)
-
Evan Laforge
-
Ivan Lazar Miljenovic
-
Stephen Tetley