PrefixMap: code review request

Hi, I'm a newish Haskell hacker with way too much experience hacking Lisp. At first, I found my Haskell code looking very lisp-y. I think my code is becoming more idiomatic haskell. I would be very grateful to anyone who would take a glance at the code below and point out any un-idiomatic usages that jump out. It's a small module from a program which looks for palindromes in a list of words. Thanks very much. Cheers, David \begin{code} {-# OPTIONS -O2 -optc-O3 #-} module PrefixMap (PrefixMap,fromDistinctAscPairList,searchMap) where import Data.List import qualified Data.Map as Map import Test.HUnit \end{code} The PrefixMap datastructure implements a Prefix Tree which allows a key/value relationship. \begin{code} data PrefixMap k v = Node (Maybe v) (Map.Map k (PrefixMap k v)) deriving (Show) \end{code} A PrefixMap is built from an alphabet enumerating the possible constituents of keys and a list of pairs of keys and objects. A key is a string of elements of the alphabet. The list must be distinct and in ascending order. The constraint is not checked. \begin{code} fromDistinctAscPairList :: Ord k => [k]->[([k],v)]->PrefixMap k v fromDistinctAscPairList alphabet pairList = build alphabet Nothing (partList pairList alphabet) partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result build :: Ord k => [k]->(Maybe v)->[(k,[([k],v)])]->(PrefixMap k v) build alphabet val pairs = Node val $ Map.fromDistinctAscList treePairs where treePairs = [(c,mkITree l)|(c,l)<-pairs] mkITree l = build alphabet x (partList l' alphabet) where (x,l') = findNode $ snipKeys l snipKeys :: Ord k => [([k],v)]->[([k],v)] snipKeys l = [(k,v) | (_:k,v) <- l] findNode :: Ord k => [([k], v)] -> (Maybe v, [([k], v)]) findNode l = if null suffix then (Nothing,l) else ((Just $ snd.head $ suffix),prefix++(tail suffix)) where (prefix,suffix) = span (not.null.fst) l \end{code} searchMap applies a function to each object in the PrefixTree that is on the path specified by the key and the subtree below it and returns a list of the results. \begin{code} searchMap :: Ord k => (v -> vv) -> [k] -> PrefixMap k v -> [vv] searchMap f [] t = walk f t [] searchMap f (k:ks) (Node v al) = maybe rest ((:rest) . f) v where rest = maybe [] (searchMap f ks) (Map.lookup k al) walk :: (a -> b) -> PrefixMap k a -> [b] -> [b] walk f (Node Nothing al) z = Map.fold (walk f) z al walk f (Node (Just x) al) z = Map.fold (walk f) (f x:z) al test1 = TestCase (do input <- readFile "words.txt" let dict = words input pairs = zip dict dict alpha = ['a'..'z'] ftree = fromDistinctAscPairList alpha pairs fAnswer = searchMap id "assert" ftree rtree = fromDistinctAscPairList alpha $ sort $ zip (map reverse dict) dict rAnswer = searchMap id "tressa" rtree assertEqual "forward search" ["as","ass","assertedly","asserted", "asserters","asserter","asserting", "assertions","assertion","assertively", "assertivenesses","assertiveness", "assertive","assertors","assertor", "asserts","assert"] fAnswer assertEqual "reverse search" ["reassert","overassert","assert"] rAnswer ) tests = TestList [TestLabel "Tree Test" test1] \end{code} \end{document} -------------------------------- David F. Place mailto:d@vidplace.com

On Feb 27, 2006, at 2:30 PM, David F.Place wrote:
Hi,
I'm a newish Haskell hacker with way too much experience hacking Lisp. At first, I found my Haskell code looking very lisp-y. I think my code is becoming more idiomatic haskell. I would be very grateful to anyone who would take a glance at the code below and point out any un-idiomatic usages that jump out. It's a small module from a program which looks for palindromes in a list of words. Thanks very much.
[snip]
partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result
I don't think I've ever seen nested "where"s before ;-) I'd probably avoid that; it's really hard to read. If your function is sufficiently complicated that it needs its own "where" clause, you should probably just promote it to the top level. If it is truly internal, you can avoid exporting it with the module export list. [snip]
searchMap :: Ord k => (v -> vv) -> [k] -> PrefixMap k v -> [vv]
Humm... double "v" seems like a pretty poor choice for a type variable name. [snip] Just a couple of general comments: -- you don't seem to like horizontal whitespace much. I know, I know, whitespace placement can be a highly personal thing, but I find most haskellers usually use a bit more horizontal whitespace, particularly in function signatures. The arrow is almost always written with surrounding spaces. I personally like space after commas in tuples and lists. Several of your list comprehensions would also be easier to read with a bit of whitespace. I also tend to like '=' signs lined up in a column for lets, pattern function definitions and wheres. -- Nested tuple and lists types are pretty hard to read. In your code [([k],v)] appears a lot. Consider defining a type alias for it. Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

On Feb 27, 2006, at 3:11 PM, Robert Dockins wrote:
-- Nested tuple and lists types are pretty hard to read. In your code [([k],v)] appears a lot. Consider defining a type alias for it.
Funny, of course as an inveterate lisp hacker, I am completely insensitive to nesting depth. :-) Your suggestion cleans up the code quite nicely. -------------------------------- David F. Place mailto:d@vidplace.com

David F.Place wrote: [snip]
partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result
If the above code is put into a non-literate file as: partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result there is a parse error (using ghc) at the line beginning with result'. This binding doesn't line up with anything. Also the second 'where' is dangerously close to the column started by the 'f' after the first 'where' (may not be noticeable in this email due to whatever font it is being displayed in but it's only indented by one space) which makes it a bit tricky to read. I suggest: partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result or: partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result because always starting an 'if' construct on a new line ensures that you will never ever have any problems with it's layout (especially helpful for people used to C) when you use it in a 'do' block. Also, both the above variants ensure that your code can be edited using variable width fonts, any tabbing regime you like (as long as leading whitespace only has tabs and never spaces), and will be immune to identifier renamings. The golden rule for 'safe' layout is always to start a layout block on the next line after 'do' 'where' 'of' 'let' and to try to resist the tempation to save a line by squashing the first binding onto the same line as the 'let' etc. The second variant has the added advantage that the horizontal indentation is kept under control (eg in the above all indenting is 3 spaces further in) whereas when people write things like if p == q then let a = 4 b = if a ==4 then let q = 2 s = 56 after only 3 indents the code is already half way across the screen (not to mention the fact that the above layout is completely brittle and can be destroyed by a simple search-and-replace op to change 'q' to 'hello') Of course all the above is just easy-to-talk-about technicalities of layout and you were really interested in getting feedback about idiomatic usage - hopefully someone else will give some feedback about that (I'm too lazy...) :-) Regards, Brian.

On Feb 27, 2006, at 5:54 PM, Brian Hulley wrote:
there is a parse error (using ghc) at the line beginning with result'. This binding doesn't line up with anything. Also the second 'where' is dangerously close to the column started by the 'f' after the first 'where' (may not be noticeable in this email due to whatever font it is being displayed in but it's only indented by one space) which makes it a bit tricky to read.
Whoops, that's noise in the transmission. In my original file and my original email, it is indented correctly. As for other indentation issues, I always use whatever emacs suggests. Is that not a good strategy? -------------------------------- David F. Place mailto:d@vidplace.com

David F. Place wrote:
On Feb 27, 2006, at 5:54 PM, Brian Hulley wrote:
there is a parse error (using ghc) at the line beginning with result'. This binding doesn't line up with anything. Also the second 'where' is dangerously close to the column started by the 'f' after the first 'where' (may not be noticeable in this email due to whatever font it is being displayed in but it's only indented by one space) which makes it a bit tricky to read.
Whoops, that's noise in the transmission. In my original file and my original email, it is indented correctly. As for other indentation issues, I always use whatever emacs suggests. Is that not a good strategy?
I always think it's a bit like income tax! Over the centuries the rules have got more and more complicated and instead of simplifying everything, computers have allowed all this mess to survive which ultimately makes life difficult for us poor humans because no-one knows any more what's going on (except a computer which has its own agenda for humanity...) Whoever thought up the original Haskell layout rule assumed that people would be happy using a single fixed width font, tabs set to 8 spaces, and didn't care about the brittleness of the code (in the face of identifier renamings) it allowed one to write. A like-minded person has then created an emacs mode which supports this. It is probably also possible to create an emacs macro to safely rename identifiers by first parsing the code, doing the renaming in the AST, then pretty-printing the code back into the file. However if you voluntarily use only a restricted subset of all possible layouts, you end up with non-brittle code that can be edited in any editor (eg someone else's editor who doesn't understand emacs:-) ) and where safely renaming identifiers is just a simple text-based search-and-replace operation. Of course having said all this, many people have strong personal views about different ways of laying out code, whether to use tabs or not, etc etc. I was just using your example as an excuse for some "consciousness-raising" about safe vs brittle layouts, but of course at the end of the day everyone has to decide for themselves. For example you might find that the aesthetics or readability of a 'brittle' layout, or the ease of editing using the particular emacs mode, outweighs the disadvantages of its being brittle. As Dr Flox said on Star Trek Enterprise to T'Pol "Infinite diversity in infinite combinations" !!! :-) Best regards, Brian.

Brian Hulley wrote:
Whoever thought up the original Haskell layout rule assumed that people would be happy using a single fixed width font, tabs set to 8 spaces, and didn't care about the brittleness of the code (in the face of identifier renamings) it allowed one to write.
Are you complaining that Haskell permits you to write code with these problems, or that it requires you to? The latter is not true. Instead of keyword clause1 clause2 you can always write keyword clause1 clause2 or keyword { clause1 ; clause2 } Both styles are insensitive to tab width and renaming of identifiers. The off-side rule only comes into play when you don't include an explicit {, so you can suppress it entirely if you prefer. If you have a different layout rule in mind I'd be interested in hearing it, but I think Haskell's is quite good overall. Lisp has similar indentation problems despite the lack of a layout rule, since people commonly write (foo (...) (...)) Renaming foo can't confuse the compiler, but I've never had a Haskell layout error change the meaning of my program. At worst it causes the program to be rejected. I do edit my source code in a fixed-width font, but it's a very pretty font (Bitstream Vera Sans Mono). It's a small price to pay for the convenience of being able to use 2D layout, even in languages where it's not significant, and in comments. -- Ben

Ben Rudiak-Gould wrote:
Brian Hulley wrote:
Whoever thought up the original Haskell layout rule assumed that people would be happy using a single fixed width font, tabs set to 8 spaces, and didn't care about the brittleness of the code (in the face of identifier renamings) it allowed one to write.
Are you complaining that Haskell permits you to write code with these problems, or that it requires you to? The latter is not true. Instead [snip]
Just that it allows you to, because this means other people's code (which you may be editing) can be brittle.
If you have a different layout rule in mind I'd be interested in hearing it, but I think Haskell's is quite good overall.
Here is my proposed layout rule: 1) All layout keywords (where, of, let, do) must either be followed by a single element of the corresponding block type, and explicit block introduced by '{', or a layout block whose first line starts on the *next* line and whose indentation is accomplished *only* by tabs In particular, this allows: let a = 56 in a*a and let a = 56 b = 78 in a*b but not let a = 56 b = 78 or let a = 56; b = 78 c = 90 I would also make it that explicit braces are not allowed to switch off the layout rule (ie they can be used within a layout), multiline strings would not be permitted, and multiline comments would not be permitted (pragmas could easily be used just by using --#) (I'd have a special keyword eg '{}module' instead of 'module' at the top of a file to switch off layout for the whole file if required, but making the presence of the layout rule depend on whether or not there are surrounding braces makes life *way* too complicated imho) This would give the following advantages: 1) When you see a ';' you could immediately tell which block it belongs to by looking backwards till the next '{' 2) Variable width fonts can be used, or different font faces to represent different sorts of identifier eg class names, tycons, value constructors, operators like `seq` as opposed to seq etc 3) Using only tabs ensures that vertical alignment goes to the same position on the screen regardless of the font and tabs could even have different widths just like in a wordprocessor 4) Any keypress has a localised effect on the parse tree of the buffer as a whole ( { " no longer kill everything which follows and there would be no {- ) 5) It paves the way for a much more immersive editing environment, but I can't say more about this at the moment because I haven't finished writing it yet and it will be a commercial product :-))) Using my self-imposed layout rule I'm currently editing all my Haskell code in a standard text editor using tabs set to 4 spaces and a variable width font and have no problems. Regards, Brian.

Brian Hulley wrote:
Here is my proposed layout rule:
1) All layout keywords (where, of, let, do) must either be followed by a single element of the corresponding block type, and explicit block introduced by '{', or a layout block whose first line starts on the *next* line
I wouldn't have much trouble adapting to that.
and whose indentation is accomplished *only* by tabs
You can't be serious. This would cause far more problems than the current rule.
I would also make it that explicit braces are not allowed to switch off the layout rule (ie they can be used within a layout),
I don't understand. What does "used within a layout" mean?
multiline strings would not be permitted,
They aren't now, except with \ escapes. A stray " will be caught on the same line unless the line happens to end with \ and the next line happens to begin with \, which is exceedingly unusual.
and multiline comments would not be permitted (pragmas could easily be used just by using --#)
But --# doesn't introduce a comment. And this would make UNPACK pragmas rather inconvenient to use.
1) When you see a ';' you could immediately tell which block it belongs to by looking backwards till the next '{'
I guess that might be helpful, but it doesn't seem easier than looking left to the beginning of the current line and then up to the first less-indented line.
2) Variable width fonts can be used,
They can be used now, if you adhere to a certain style, but not everyone likes that style. I wrote in C++ with a variable width font and tabs at one time, but eventually went back to fixed width. One reason was that I couldn't use comment layout conventions that tend (in my experience) to improve readability more than monospacing hurts it. Another reason was that glyph widths appropriate to natural languages didn't work all that well for source code. Spaces are much more important in source code than in natural language, for example. A proportional font designed for source code would be nice, but I haven't found one yet. Stroustrup used a mixture of proportional and monospaced glyphs in _The C++ Programming Language_ and it worked well.
or different font faces to represent different sorts of identifier eg class names, tycons, value constructors, operators like `seq` as opposed to seq etc
Lots of editors do this with monospaced fonts; I think it's orthogonal to the layout issue.
3) Using only tabs ensures that vertical alignment goes to the same position on the screen regardless of the font and tabs could even have different widths just like in a wordprocessor
Requiring tabs is a really bad idea. Just forget it. Seriously.
4) Any keypress has a localised effect on the parse tree of the buffer as a whole ( { " no longer kill everything which follows and there would be no {- )
I don't understand why this is an advantage. If you have an editor that highlights comments in green, then large sections of the program will flash green while you type a {- -} comment, which might be annoying, but it also means you'll never forget to close the comment, so the practical benefit of forbidding {- -}, as opposed to simply not typing it yourself, seems nil.
5) It paves the way for a much more immersive editing environment, but I can't say more about this at the moment because I haven't finished writing it yet and it will be a commercial product :-)))
I guess everything has been leading up to this, but my reaction is that it renders the whole debate irrelevant. The only reason layout exists in the first place is to make source code look good in ordinary text editors. If you have a high-level source code editor that manipulates the AST, then you don't need layout, or tabs, or any of that silly ASCII stuff. The only time you need to worry about layout is when interoperating with implementations that use the concrete syntax, and then there's nothing to stop you from exporting in any style you like. And when importing, there's no reason to place restrictions on Haskell's layout rule, because the visual layout you display in the editor need have no connection to the layout of the imported file.
Using my self-imposed layout rule I'm currently editing all my Haskell code in a standard text editor using tabs set to 4 spaces and a variable width font and have no problems.
Which is the best argument for keeping the current rule! If it were changed as you propose, then someday Hugh Briley would come along and complain that Haskell's layout syntax squandered screen space---but he *wouldn't* be able to program in his preferred style, because it would no longer be allowed. Religious freedom is a good thing. {- Ben -}

Ben Rudiak-Gould wrote:
Brian Hulley wrote:
Here is my proposed layout rule:
1) All layout keywords (where, of, let, do) must either be followed by a single element of the corresponding block type, and explicit block introduced by '{', or a layout block whose first line starts on the *next* line
I wouldn't have much trouble adapting to that.
and whose indentation is accomplished *only* by tabs
You can't be serious. This would cause far more problems than the current rule.
Why? Surely typing one tab is better than having to hit the spacebar 4 (or 8) times?
I would also make it that explicit braces are not allowed to switch off the layout rule (ie they can be used within a layout),
I don't understand. What does "used within a layout" mean?
I meant that {;} would be used just like any other construct that has to respect the layout rule so you could write let a = let { b = 6; z = 77; h = 99; p = 100} in b+z+h + p etc but not: let a = let { b = 6; z = 77; h = 99; -- this binding would be part of the outermost 'let' p = 100} in b+z+h + p
multiline strings would not be permitted,
They aren't now, except with \ escapes. A stray " will be caught on the same line unless the line happens to end with \ and the next line happens to begin with \, which is exceedingly unusual.
and multiline comments would not be permitted (pragmas could easily be used just by using --#)
But --# doesn't introduce a comment. And this would make UNPACK pragmas rather inconvenient to use.
-- # but I hadn't thought about UNPACK... The motivation in both points is to make it easy for an editor to determine which lines need to be re-parsed based on the number of leading tabs alone.
1) When you see a ';' you could immediately tell which block it belongs to by looking backwards till the next '{'
I guess that might be helpful, but it doesn't seem easier than looking left to the beginning of the current line and then up to the first less-indented line.
There was an example posted on another thread where someone had got into confusion by using ; after a let binding in a do construct with an explicit brace after the 'do' but not after the 'let' (sorry I can't find it again). Also the current layout rule uses the notion of an implicit opening brace which is a to be regarded as a real opening brace as far as ';' in concerned but an unreal non-existent opening brace as far as '}' is concerned. Thus I think it is a real mix-up.
2) Variable width fonts can be used,
They can be used now, if you adhere to a certain style, but not everyone likes that style. I wrote in C++ with a variable width font and tabs at one time, but eventually went back to fixed width. One reason was that I couldn't use comment layout conventions that tend (in my experience) to improve readability more than monospacing hurts it. Another reason was that glyph widths appropriate to natural languages didn't work all that well for source code. Spaces are much more important in source code than in natural language, for example. A proportional font designed for source code would be nice, but I haven't found one yet. Stroustrup used a mixture of proportional and monospaced glyphs in _The C++ Programming Language_ and it worked well.
or different font faces to represent different sorts of identifier eg class names, tycons, value constructors, operators like `seq` as opposed to seq etc
Lots of editors do this with monospaced fonts; I think it's orthogonal to the layout issue.
For example on Windows Trebuchet MS is a very nice font, also Verdana, both of which are not monospaced. But yes I agree it's not a major issue and I just see the option of being able to use them as a nice side-effect.
3) Using only tabs ensures that vertical alignment goes to the same position on the screen regardless of the font and tabs could even have different widths just like in a wordprocessor
Requiring tabs is a really bad idea. Just forget it. Seriously.
I'm really puzled here. I've been using tabs to indent my C++ code for at least 10 years and don't see the problem. The only problem would be if someone mixed tabs with spaces. Since it has to be either tabs only or spaces only I'd choose tabs only to save keystrokes. I suppose though it is always going to be a matter of personal taste...
4) Any keypress has a localised effect on the parse tree of the buffer as a whole ( { " no longer kill everything which follows and there would be no {- )
I don't understand why this is an advantage. If you have an editor that highlights comments in green, then large sections of the program will flash green while you type a {- -} comment, which might be annoying, but it also means you'll never forget to close the comment, so the practical benefit of forbidding {- -}, as opposed to simply not typing it yourself, seems nil.
But it makes it much easier for the editor to determine where to start re-parsing from (see below). If you allow {- everything becomes a lot more complicated and who needs them anyway? In Visual C++ for example, you just select a block of text and type Control-K Control-C to put single line comments at the beginning of each line in the block. I think (apart from UNPACK) multi-line comments are just a left-over from very old days before single line comments were invented and editors had these simple macros built-in.
5) It paves the way for a much more immersive editing environment, but I can't say more about this at the moment because I haven't finished writing it yet and it will be a commercial product :-)))
I guess everything has been leading up to this, but my reaction is that it renders the whole debate irrelevant. The only reason layout exists in the first place is to make source code look good in ordinary text editors. If you have a high-level source code editor that manipulates the AST, then you don't need layout, or tabs, or any of that silly ASCII stuff. The only time you need to worry about layout is when interoperating with implementations that use the concrete syntax, and then there's nothing to stop you from exporting in any style you like. And when importing, there's no reason to place restrictions on Haskell's layout rule, because the visual layout you display in the editor need have no connection to the layout of the imported file.
Both 4) and 5) are because it is very much faster to type raw ASCII into an editor than to have to click on all kinds of boxes with the mouse. It is also surprisingly difficult to find an intuitive way of navigating a tree of boxes using only an ASCII keyboard, because of the conflict between the logical parent/child or sibling/sibling relationship and the way these could be laid out on the screen in 2d. Eg the right arrow could mean "go to the next sibling" but the next sibling may have to be laid out underneath the current node, so it all becomes very confusing. I don't think it is possible to lay out code in such a way that all parent/child relationships correspond to a vertical relationship on the screen, but possibly my thinking is too influenced by how programs are usually laid out. Thus I don't think an editor that forced people to work directly with the AST would ever catch on. Years ago I read about a Microsoft Project on "intentional programming" which was about manipulating an arbitrary AST directly but afaik nothing came of it since it was just too painful to use. I also read about some research to do with deriving programs interactively from a proof, where clicking on boxes etc comes into its own, but I don't think there is yet any interactive proof system that comes close to being able to derive "real world" software. Certainly I'd be very interested to know if there is. Currently all the ASCII editors I know of only do keyword highlighting, or occasional ("wait a second while I'm updating the buffer") identifier highlighting. What I'm trying to get however is complete grammatical highlighting and type checking that is instantaneous as the user types code, so this means that the highlighter/type checker needs a complete AST (with 'gap' nodes to represent spans of incomplete/bad syntax) to work from. However it is way too expensive to re-parse the whole of a big buffer after every keypress (I tried it with a parser written in C++ to parse a variant of ML and even with the optimized build and as many algorithmic optimizations as I could think of it was just far too slow, and I wasn't even trying to highlight duplicate identifiers or do type inference) Thus to get a fast responsive editing environment which needs to maintain a parse of the whole buffer to provide effective grammatical highlighting and not just trivial keyword highlighting it seems (to me) to be essential to be able to limit the effect of any keystroke especially when the user is just typing text from left to right but where there may be more stuff after the cursor eg if the user has gone back to editing a function at the top of a file. Things like {- would mean that all the parse trees for everything after it would have to be discarded. Also, flashing of highlighting on this scale could be very annoying for a user, so I'd rather just delete this particular possibility of the user getting annoyed when using my software :-) thus my hopeless attempts to convince everyone that {- is bad news all round :-))) Of course you're right that for loading and saving files I could do Haskell -> My representation -> Haskell but then someone reading a tutorial on Haskell would find that my editor (or by the above arguments, any similar rich-feedback editor) didn't accept all the examples...
Using my self-imposed layout rule I'm currently editing all my Haskell code in a standard text editor using tabs set to 4 spaces and a variable width font and have no problems.
Which is the best argument for keeping the current rule! If it were changed as you propose, then someday Hugh Briley would come along and complain that Haskell's layout syntax squandered screen space---but he *wouldn't* be able to program in his preferred style, because it would no longer be allowed. Religious freedom is a good thing.
Freedom has many dimensions, some interdependent: Simplifying the syntax by using a simpler layout rule would make it possible for people to create very efficient incremental parsers and in turn develop more responsive development environments for consensus Haskell, which in turn might lead to more people understanding and therefore using the language, more libraries, more possibilities for individual people to earn a living themselves by programming, more people thinking things out for themselves instead of absorbing corporate propaganda, thus fairer laws, justice, liberty, and *true human freedom* for all!!! :-) Regards, Brian. -------------------------------------------------------- "... flee from the Hall of Learning. This Hall is dangerous in its perfidious beauty, is needed but for thy probation. Beware, Lanoo, lest dazzled by illusive radiance thy soul should linger and be caught in its deceptive light." -- Voice of the Silence stanza 33

On Wed, 1 Mar 2006, Brian Hulley wrote:
Ben Rudiak-Gould wrote:
Brian Hulley wrote:
Here is my proposed layout rule:
<snip>
and whose indentation is accomplished *only* by tabs
You can't be serious. This would cause far more problems than the current rule.
Why? Surely typing one tab is better than having to hit the spacebar 4 (or 8) times?
Not when it prevents me from ever exhibiting the slightest shred of style in my code. I use that control for readability purposes in my code. -- flippa@flippac.org "My religion says so" explains your beliefs. But it doesn't explain why I should hold them as well, let alone be restricted by them.

BH> > Why? Surely typing one tab is better than having to hit the spacebar 4 (or 8) BH> > times? PC> Not when it prevents me from ever exhibiting the slightest shred of style PC> in my code. I use that control for readability purposes in my code.
[snip]
BH> I'm really puzled here. I've been using tabs to indent my C++ code for at BH> least 10 years and don't see the problem. At least two reasons: 1. C++ doesn't care about any whitespace (except to separate tokens). Haskell cares about leading whitespace (which it is clear you are thinking a lot about...) but 2. as Philippa mentioned, Haskell programmers care a ton about inter-line, inter-word layout/alignment, for example, lining up = signs and arguments to functions in pattern matches, etc. C++ does not invite this style of declarative programming so it is not surprising that it wasn't an issue: aside from the indentation, I rarely type fancy whitespace inside a giving line of C++ code to align elements with those on a preceding line. In Haskell, this unofficial layout "style" doesn't affect the machine-parsing of the code, but rather the human-parsing of the code. (In fact, it's one of my favorite things about Haskell.) If you want to see what can be accomplished with variable width fonts and complex layouts (not just beginning of lines but rather inter-line, inter-word alignment) you should checkout lhs2TeX. They accomplish all their magic with spaces. BH> The only problem would be if BH> someone mixed tabs with spaces. Since it has to be either tabs only or BH> spaces only I'd choose tabs only to save keystrokes. BTW, tab doesn't type the tab character (at least in emacs and I think vim) but instead moves the left edge of the current line by adding or deleted spaces (or trying to ident the right amount). This usually means you don't have to type 4 or 8 spaces. (And anyway, I would just hold the key down if I had to type more than one spacebar, etc.)
[snip] For example on Windows Trebuchet MS is a very nice font, also Verdana, both of which are not monospaced. But yes I agree it's not a major issue and I just see the option of being able to use them as a nice side-effect.
Very few programmers I know would go to variable width fonts just to use some Microsoft font to edit code... (BTW I like Trebuchet and Verdana too.) To each his/her own! Cheers, Jared. -- http://www.updike.org/~jared/ reverse ")-:"

"Brian Hulley"
You can't be serious. This would cause far more problems than the current rule.
Why? Surely typing one tab is better than having to hit the spacebar 4 (or 8) times?
What you type depends on your editor. I hit tab, and the editor inserts an appropriate number of spaces. (I thought all editors did this now?)
There was an example posted on another thread where someone had got into confusion by using ; after a let binding in a do construct with an explicit brace after the 'do' but not after the 'let' (sorry I can't find it again).
If you allow {- everything becomes a lot more complicated and who needs them anyway?
Multi line comments are nice for commenting out blocks of code. It is much less intrusive, in particular if you're using version control.
back to editing a function at the top of a file. Things like {- would mean that all the parse trees for everything after it would have to be discarded. Also, flashing of highlighting on this scale could be very annoying for a user, so I'd rather just delete this particular possibility of the user getting annoyed when using my software :-)
Couldn't your editor just be a little bit smarter? E.g. count the {-s and -}s, and only comment-hilight them if there are two of them? Retain a history of old parse trees, so that it is quick to return to a previous one?
Haskell, which in turn might lead to more people understanding and therefore using the language, more libraries, more possibilities for
You forget one thing: "Avoid success at all costs" :-) -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
Multi line comments are nice for commenting out blocks of code.
They're also nice for comments within a line. E.g. haskell-src-exts contains the declaration data HsQualConDecl = HsQualConDecl SrcLoc {- forall -} [HsName] {- . -} HsContext {- => -} HsConDecl Probably half of my uses of {- -} begin and end on the same line. -- Ben

On Wed, 2006-03-01 at 01:36 +0000, Brian Hulley wrote:
Currently all the ASCII editors I know of only do keyword highlighting, or occasional ("wait a second while I'm updating the buffer") identifier highlighting.
hIDE and Visual Haskell use the ghc lexer and get near-instantaneous syntax highlighting. Because they use a proper lexer they get fully accurate highlighting, not your ordinary "fairly close" regex-based highlighting. They also only re-lex the bits needed. They keep the lexer state for the start of each line and when a line changes, start re-lexing from the beginning of that line and keep going until the lexer ends up in the same state as a previously saved state on a line. I may be wrong but I think there is an optimisation to not lex beyond the end of the current screen until it is scrolled. This means that even when someone types "{-"m you never have to re-lex & re-highlight more than a screen full.
What I'm trying to get however is complete grammatical highlighting and type checking that is instantaneous as the user types code, so this means that the highlighter/type checker needs a complete AST (with 'gap' nodes to represent spans of incomplete/bad syntax) to work from.
With hIDE and Visual Haskell we have found it sufficient to do a complete parse rather than do it incrementally. We wait for a second or so after the user has stopped typing (since highlighting errors as you're actually typing would just be annoying) and then run the ghc front end. This is sufficiently fast (except perhaps on very large modules).
However it is way too expensive to re-parse the whole of a big buffer after every keypress (I tried it with a parser written in C++ to parse a variant of ML and even with the optimized build and as many algorithmic optimizations as I could think of it was just far too slow, and I wasn't even trying to highlight duplicate identifiers or do type inference)
It may be possible to do some more caching to speed things up without going to a full incremental parser. For example the editor could maintain a buffer of lexed symbols and have a traditional parser use that. It may also be possible to just re-parse parts of the file.
Thus to get a fast responsive editing environment which needs to maintain a parse of the whole buffer to provide effective grammatical highlighting and not just trivial keyword highlighting it seems (to me) to be essential to be able to limit the effect of any keystroke especially when the user is just typing text from left to right but where there may be more stuff after the cursor eg if the user has gone back to editing a function at the top of a file. Things like {- would mean that all the parse trees for everything after it would have to be discarded. Also, flashing of highlighting on this scale could be very annoying for a user, so I'd rather just delete this particular possibility of the user getting annoyed when using my software :-) thus my hopeless attempts to convince everyone that {- is bad news all round :-)))
As I mentioned, it is possible to limit the effect of a {- to a screen full of re-lexing. I grant you that it's likely to do worse things to your incremental parser. Duncan

Duncan Coutts wrote:
hIDE and Visual Haskell use the ghc lexer and get near-instantaneous syntax highlighting.
Hmm... I just installed Visual Haskell 0.1, and when I type in the editor, CPU usage rises to about 70% and there's a noticeable delay before each character appears on the screen. This is a very short module (~100 lines) and a Pentium M 1600 CPU. Am I doing something wrong? -- Ben

On Wed, 2006-03-01 at 22:58 +0000, Ben Rudiak-Gould wrote:
Duncan Coutts wrote:
hIDE and Visual Haskell use the ghc lexer and get near-instantaneous syntax highlighting.
Hmm... I just installed Visual Haskell 0.1, and when I type in the editor, CPU usage rises to about 70% and there's a noticeable delay before each character appears on the screen. This is a very short module (~100 lines) and a Pentium M 1600 CPU. Am I doing something wrong?
I can't say too much about the internals of VH since I've not see the code, only the description. Perhaps that's because they're starting the parser immediately after every keystroke and/or not killing the parser when the user types another key. I've been using hIDE on a Pentium M 1600 laptop and on the size of modules I've tried so far it's quick. The syntax highlighting updates immediately and the type checker shows up errors a second or so after I stop typing (which is because we wait about that long before starting the parser). Duncan

I wrote:
I just installed Visual Haskell 0.1, and when I type in the editor, CPU usage rises to about 70% and there's a noticeable delay before each character appears on the screen.
This is no longer happening, so I guess I ran afoul of a bug. -- Ben

On Wednesday 01 March 2006 02:36, Brian Hulley wrote:
Ben Rudiak-Gould wrote:
Brian Hulley wrote:
Here is my proposed layout rule:
1) All layout keywords (where, of, let, do) must either be followed by a single element of the corresponding block type, and explicit block introduced by '{', or a layout block whose first line starts on the *next* line
I wouldn't have much trouble adapting to that.
and whose indentation is accomplished *only* by tabs
You can't be serious. This would cause far more problems than the current rule.
Why? Surely typing one tab is better than having to hit the spacebar 4 (or 8) times?
What kind of editor are you using? Notepad? I am used to hitting TAB key and get the correct number of spaces, according to how I configured my editor (NEdit) for the current language mode. TAB characters in program text should be forbidden by law. As well as editors that by default insert a tab char instead of spaces. Ben

Benjamin Franksen wrote:
[snip] I am used to hitting TAB key and get the correct number of spaces, according to how I configured my editor (NEdit) for the current language mode.
The only thing then is what happens when you type backspace or left arrow to get back out to a previous indentation? If the TAB character inserts spaces, there's no problem going from left to right but it would seem more complicated to go back out again ie without having to type backspace 4 times and try to hope when outdenting more that I haven't typed backspace 23 times instead of 24 times by mistake thus not getting to the column I expected. This is my only reason for wanting to keep tab characters in the text, and certainly it does give some disadvantages when trying to line up '|' '=' etc vertically - at the moment I admit my layouts do end up a bit contrived as I have to use more newlines to ensure I can use tabs only to accomplish the line-up... So any solutions welcome :-) Regards, Brian. -------------------------------------------------------- "... flee from the Hall of Learning. This Hall is dangerous in its perfidious beauty, is needed but for thy probation. Beware, Lanoo, lest dazzled by illusive radiance thy soul should linger and be caught in its deceptive light." -- Voice of the Silence stanza 33

On Wednesday 01 March 2006 13:35, Brian Hulley wrote:
Benjamin Franksen wrote:
[snip] I am used to hitting TAB key and get the correct number of spaces, according to how I configured my editor (NEdit) for the current language mode.
The only thing then is what happens when you type backspace or left arrow to get back out to a previous indentation? If the TAB character inserts spaces, there's no problem going from left to right but it would seem more complicated to go back out again ie without having to type backspace 4 times and try to hope when outdenting more that I haven't typed backspace 23 times instead of 24 times by mistake thus not getting to the column I expected.
With NEdit, hitting backspace /right after/ hitting the tab key deletes all the whitespace that were inserted, be it a tab character or multiple spaces. (This works also if the line was auto-indented to the same indentation depth as the previous one. That is, hit enter and then backspace, and you are at previous indentation level minus one.) If, however, you press any other key (e.g. any arrow keys), subsequent backspace will only delete a single space. Other behaviors can be easily implemented by writing a macro and binding it to the backspace key. The same is most probably true for emacs. The upshot is: Any decent modern text editor allows to map keys like tab and backspace to almost any action desired, depending on context, language mode, whatever. Ben

On Wed, 1 Mar 2006 12:35:44 -0000, "Brian Hulley"
The only thing then is what happens when you type backspace or left arrow to get back out to a previous indentation?
The Borland IDEs have long supported various "smart" indentation features, which can each be individually turned on or off (see the third one for the answer to your specific question): * Auto indent mode - Positions the cursor under the first nonblank character of the preceding nonblank line when you press ENTER in the Code Editor. * Smart tab - Tabs to the first non-whitespace character in the preceding line. If "Use tab character" is enabled, this option is off. * Backspace unindents - Aligns the insertion point to the previous indentation level (outdents it) when you press BACKSPACE, if the cursor is on the first nonblank character of a line. There are a number of other tab-related options as well. Steve Schafer Fenestra Technologies Corp. http://www.fenestra.com/

Brian Hulley wrote: [snip]
So any solutions welcome :-)
Thank to everyone who replied to my queries about this whole layout issue. One other thing I've been wanting to ask (not to change! :-)) for a while is: how is the following acceptable according to the rules in the Haskell98 report where "where" is one of the lexemes, which when followed by a line more indented than the line the layout-starting-lexeme is on, should start an implicit block: module M where data T = ..... -- not indented! According to my understanding of the layout algorithm, the above code would have to be written: module M where data T = .... Can anyone shed some light on what the formal rule is that allows the first (and very useful) way of laying out code to be ok? Thanks, Brian.

Layout only applies when something is less indented than previous lines, I believe... e.g.
do c <- getContents "filename" putStrLn "blah"
or
do x <- getContents "filename" putStrLn "ok"
works fine but
do c <- blahAction putStrLn "blah"
obviously won't work
Jared.
On 3/2/06, Brian Hulley
Brian Hulley wrote: [snip]
So any solutions welcome :-)
Thank to everyone who replied to my queries about this whole layout issue.
One other thing I've been wanting to ask (not to change! :-)) for a while is: how is the following acceptable according to the rules in the Haskell98 report where "where" is one of the lexemes, which when followed by a line more indented than the line the layout-starting-lexeme is on, should start an implicit block:
module M where data T = ..... -- not indented!
According to my understanding of the layout algorithm, the above code would have to be written:
module M where data T = ....
Can anyone shed some light on what the formal rule is that allows the first (and very useful) way of laying out code to be ok?
Thanks, Brian.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- http://www.updike.org/~jared/ reverse ")-:"

Brian Hulley wrote:
Brian Hulley wrote: One other thing I've been wanting to ask (not to change! :-)) for a while is: how is the following acceptable according to the rules in the Haskell98 report where "where" is one of the lexemes, which when followed by a line more indented than the line the layout-starting-lexeme is on, should start an implicit block:
module M where data T = ..... -- not indented!
According to my understanding of the layout algorithm, the above code would have to be written:
module M where data T = ....
Can anyone shed some light on what the formal rule is that allows the first (and very useful) way of laying out code to be ok?
The solution (as someone pointed out to me in an email) is that the layout block only *finishes* when the current indentation is *less* than the indentation of the lines in the layout block (rather than *starting* only when the current indentation is *more* than the indentation of the line containing the "where" etc). However I think there is an error in the description of this in section 2.7 of the Haskell98 report, which states: "If the indentation of the non-brace lexeme immediately following a where, let, do or of is less than or equal to the current indentation level, then instead of starting a layout, an empty list "{}" is inserted, and layout processing occurs for the current level ..." I dispute the "or equal" in the above statement, since it seems to be clearly in contradiction to what is actually being done. Regards, Brian.

Am Freitag, 3. März 2006 19:21 schrieb Brian Hulley:
Brian Hulley wrote:
Brian Hulley wrote: One other thing I've been wanting to ask (not to change! :-)) for a while is: how is the following acceptable according to the rules in the Haskell98 report where "where" is one of the lexemes, which when followed by a line more indented than the line the layout-starting-lexeme is on, should start an implicit block:
module M where data T = ..... -- not indented!
According to my understanding of the layout algorithm, the above code would have to be written:
module M where data T = ....
Can anyone shed some light on what the formal rule is that allows the first (and very useful) way of laying out code to be ok?
The solution (as someone pointed out to me in an email) is that the layout block only *finishes* when the current indentation is *less* than the indentation of the lines in the layout block (rather than *starting* only when the current indentation is *more* than the indentation of the line containing the "where" etc).
However I think there is an error in the description of this in section 2.7 of the Haskell98 report, which states:
"If the indentation of the non-brace lexeme immediately following a where, let, do or of is less than or equal to the current indentation level, then instead of starting a layout, an empty list "{}" is inserted, and layout processing occurs for the current level ..."
I dispute the "or equal" in the above statement, since it seems to be clearly in contradiction to what is actually being done.
Regards, Brian.
AFAICT, the description in the report is correct, *except for the 'where' in module LayOut where*. Consider module LayOut where fun x y = bum x y + y 4 where bum x y = y x a) the module-where is at indentation level 0, accepted here, but nowhere else, even if I indent fun and bum, fun's where must be indented further than fun itself. b) bum's definition is top-level now, but in module LayOut where fun x y = bum x y + y 4 where bum x y = y x it is local (bum is indented more than fun, but less than where), in perfect accord with the report. Even module LayOut ( fun, bum) where fun x y = bum x y + y 4 where bum x y = y x is accepted. So my guess is that layout-processing is applied only to the module-body, not to the module head and probably that should be mentioned in the report. BTW, when I read about layout in the report, this irritated me, too, so thanks for asking. Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Daniel Fischer wrote:
Am Freitag, 3. März 2006 19:21 schrieb Brian Hulley:
Brian Hulley wrote:
Brian Hulley wrote: [snip] AFAICT, the description in the report is correct, *except for the 'where' in module LayOut where*. [snip] So my guess is that layout-processing is applied only to the module-body, not to the module head and probably that should be mentioned in the report.
Thanks - that's quite a relief because my incremental parser absolutely relies on the indentation of a child block to be more than that of it's parent in the AST... Perhaps a future incarnation of Haskell could just omit the keyword "where" in the module head to avoid all this confusion. Also, all the tutorials (and book) I've read only mention the layout rule in passing somewhere deep inside the text and usually give a rather unsatisfactory hand-waving description that omits to mention the special case for "where" in the module head and/or the need for the sub-block to be indented more than the parent block, so I think depending on what tutorials people have read, putting this together with the module "where", a lot of confusion is floating about... Perhaps a wiki page is indicated? Regards, Brian.

Brian Hulley wrote:
However I think there is an error in the description of this in section 2.7 of the Haskell98 report, which states:
"If the indentation of the non-brace lexeme immediately following a where, let, do or of is less than or equal to the current indentation level, then instead of starting a layout, an empty list "{}" is inserted, and layout processing occurs for the current level ..."
I dispute the "or equal" in the above statement, since it seems to be clearly in contradiction to what is actually being done.
Section 2.7 does say that it is an informal description, so although it is correct, it is not complete. In the case of the module header, the question is really "what is the current indentation level?" (that we must be strictly greater than). The answer can be found in the formal definition of the layout rule in section 9.3. At the beginning of the module, there is _no_ current indentation level - thus the fourth equation of L applies. Regards, Malcolm

Am Montag, 6. März 2006 12:30 schrieb Malcolm Wallace:
Brian Hulley wrote:
However I think there is an error in the description of this in section 2.7 of the Haskell98 report, which states:
"If the indentation of the non-brace lexeme immediately following a where, let, do or of is less than or equal to the current indentation level, then instead of starting a layout, an empty list "{}" is inserted, and layout processing occurs for the current level ..."
I dispute the "or equal" in the above statement, since it seems to be clearly in contradiction to what is actually being done.
Section 2.7 does say that it is an informal description, so although it is correct, it is not complete. In the case of the module header, the question is really "what is the current indentation level?" (that we must be strictly greater than). The answer can be found in the formal definition of the layout rule in section 9.3. At the beginning of the module, there is _no_ current indentation level - thus the fourth equation of L applies.
Regards, Malcolm
I think, the third from last equation of L applies, since "If the first lexeme of a module is _not_ { or module, then it is preceded by {n} where n is the indentation of the lexeme.", so we start L with L ('module':ts) []. Another thing that irritates me: in section 9.5, we have the production body -> { impdecls; topdecls } | { impdecls } | { topdecls } The first line seems to suggest that import declaraions were admissible also after topdecls, but any attempt to place an impdecl after a topdecl leads --fortunately-- to a parse error in hugs and ghc, shouldn't the production be body -> { impdecls }; { topdecls } ? Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Daniel Fischer
At the beginning of the module, there is _no_ current indentation level - thus the fourth equation of L applies.
I think, the third from last equation of L applies, since "If the first lexeme of a module is _not_ { or module, then it is preceded by {n} where n is the indentation of the lexeme.", so we start L with L ('module':ts) [].
Indeed, and thus, when we get to the end of the first 'where' token, the stack of indentation contexts is still empty. Hence my remark about the fourth equation.
body -> { impdecls; topdecls } | { impdecls } | { topdecls }
The first line seems to suggest that import declaraions were admissible also after topdecls, but any attempt to place an impdecl after a topdecl leads --fortunately-- to a parse error in hugs and ghc, shouldn't the production be
body -> { impdecls }; { topdecls } ?
I think you have mis-read the brace characters as if they were the EBNF meta symbols for repetition. They do in fact mean the literal brace symbol, which may be explicitly present in the source, or inserted by the layout rule. Thus, topdecls must follow impdecls, and be at the same indentation level if layout matters. Regards, Malcolm

Am Montag, 6. März 2006 16:52 schrieb Malcolm Wallace:
Daniel Fischer
wrote: At the beginning of the module, there is _no_ current indentation level - thus the fourth equation of L applies.
I think, the third from last equation of L applies, since "If the first lexeme of a module is _not_ { or module, then it is preceded by {n} where n is the indentation of the lexeme.", so we start L with L ('module':ts) [].
Indeed, and thus, when we get to the end of the first 'where' token, the stack of indentation contexts is still empty. Hence my remark about the fourth equation.
Aha, I read 'At the beginning of the module' as 'at the very beginning', whereas you meant 'At the beginning, after the module-where', sorry to have misunderstood.
body -> { impdecls; topdecls }
| { impdecls } | { topdecls }
The first line seems to suggest that import declaraions were admissible also after topdecls, but any attempt to place an impdecl after a topdecl leads --fortunately-- to a parse error in hugs and ghc, shouldn't the production be
body -> { impdecls }; { topdecls } ?
I think you have mis-read the brace characters as if they were the EBNF meta symbols for repetition. They do in fact mean the literal brace symbol, which may be explicitly present in the source, or inserted by the layout rule. Thus, topdecls must follow impdecls, and be at the same indentation level if layout matters.
Ah, damn, fonts are too similar in my browser. And since I've never used explicit braces at the top level, I didn't expect literal brace-characters there.
Regards, Malcolm
Thanks, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Malcolm Wallace wrote:
Brian Hulley wrote:
However I think there is an error in the description of this in section 2.7 of the Haskell98 report, which states:
"If the indentation of the non-brace lexeme immediately following a where, let, do or of is less than or equal to the current indentation level, then instead of starting a layout, an empty list "{}" is inserted, and layout processing occurs for the current level ..."
I dispute the "or equal" in the above statement, since it seems to be clearly in contradiction to what is actually being done.
Section 2.7 does say that it is an informal description, so although it is correct, it is not complete. In the case of the module header, the question is really "what is the current indentation level?" (that we must be strictly greater than). The answer can be found in the formal definition of the layout rule in section 9.3. At the beginning of the module, there is _no_ current indentation level - thus the fourth equation of L applies.
Thanks. However I do think the fact that there is a special case for the module head would merit a mention in section 2.7, because at the moment it's a bit like looking at a stack of chocolate cookies and defining the top one to be vanilla - it works but who'd ever have thought of it for themselves just looking at the visual indentation on the screen? On the subject of 9.3, I'm puzzled by: "For the purposes of the layout rule, Unicode characters in a source program are considered to be of the same, fixed, width as an ASCII character. However, to avoid visual confusion, programmers should avoid writing programs in which the meaning of implicit layout depends on the width of non-space characters." Surely almost all Haskell programs rely on the width of every non-space character to be the same as the width of a space (ie monospaced font where one character == one glyph) as in let a = 3 b = 5 I'd suggest that the word "non-space" should be replaced by "multi-glyph" and perhaps there could be a recommendation to avoid the use of multi-glyph characters in the first place (otherwise an editor would have to be smart enough to maintain the correct multi-glyph spaces in the columns under them...) Regards, Brian.

Am Mittwoch, 1. März 2006 11:57 schrieb Benjamin Franksen:
TAB characters in program text should be forbidden by law. As well as editors that by default insert a tab char instead of spaces.
As founding member of the church of "The only good Tabbing involves Michaela", I wholeheartedly agree. Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Benjamin Franksen wrote:
TAB characters in program text should be forbidden by law.
Well... they are quite useful for lining things up if you're using a proportional font, and I don't think proportionally-spaced code is a bad idea. I want them to be optional. But it would be nice if parsers would warn about (or even reject) programs whose meaning depends on tab width. -- Ben

On 28/02/06, Brian Hulley
Why? Surely typing one tab is better than having to hit the spacebar 4 (or 8) times?
I'm really puzled here. I've been using tabs to indent my C++ code for at least 10 years and don't see the problem. The only problem would be if someone mixed tabs with spaces. Since it has to be either tabs only or spaces only I'd choose tabs only to save keystrokes. I suppose though it is always going to be a matter of personal taste...
It's easy to configure most editors (vim and emacs included of course) to treat multiple spaces as if they were tabs, but to only save spaces into your file. This is what I do, as it ensures that the way that the code looks to me in my editor is exactly what it looks like to the compiler. Quite often, if it looks better, I will align things past a tab stop with a few extra spaces (which only has to be done once, if your editor will start the next line at the same indentation as the previous). - Cale

On 27/02/06, David F. Place
On Feb 27, 2006, at 5:54 PM, Brian Hulley wrote:
there is a parse error (using ghc) at the line beginning with result'. This binding doesn't line up with anything. Also the second 'where' is dangerously close to the column started by the 'f' after the first 'where' (may not be noticeable in this email due to whatever font it is being displayed in but it's only indented by one space) which makes it a bit tricky to read.
Whoops, that's noise in the transmission. In my original file and my original email, it is indented correctly. As for other indentation issues, I always use whatever emacs suggests. Is that not a good strategy?
I find that I often really dislike emacs' choices for indenting -- so much so that I had to turn off smart indenting altogether to avoid holding down the spacebar all the time (of course it remaps tab). The simple indenter gives all the possible sane selections for lining things up and lets you indent further if you want. I'd normally go with the first of Brian's suggestions for laying out an if expression (condition on one line, then and else aligned at a deeper depth right after it), unless the condition was really long, in which case I might switch to the second to get a little more space on that first line. I don't personally worry about tab width, as long as things line up in any given function definition. I usually use 4 spaces when not thinking about it, but sometimes 3, or even 2, or far more, if it makes things look better. Most editors are quite good at putting the cursor at the same indent level on the following line, and this is all the help you tend to need in this regard when editing a file. (Random musing: It would be neat to have an editor which dropped subtle vertical bars under the start of blocks, to make it easier to visually apply the layout rule, and see when you haven't indented far enough, and the block is broken.) Always tell your editor to leave tabs out of your files. There are very few Haskell programmers who actually leave tabs in their source, and mixing spaces and tabs can be deadly. You can set this in emacs by adding (setq-default indent-tabs-mode nil) to your .emacs if you haven't already done so. If you also use vim, set expandtab is the line you want. :) - Cale

David F.Place wrote:
partList :: Ord k => [([k],v)]->[k]->[(k,[([k],v)])] partList pairs alphabet = reverse . fst $ foldl' f ([],pairs) alphabet where f (result,pairs) l = (result',rest) where (part,rest) = span ((==l) . head . fst) pairs result' = if null part then result else (l,part):result
I would write something like: ... where f (result, pairs) l = case span ((==l) . head . fst) pairs of ([], rest) -> ( result, rest) (part, rest) -> ((l, part):result, rest) -- Lennart

David F.Place wrote:
The PrefixMap datastructure implements a Prefix Tree which allows a key/value relationship.
\begin{code}
data PrefixMap k v = Node (Maybe v) (Map.Map k (PrefixMap k v)) deriving (Show) \end{code}
You may compare your code with Keith's implemenation of a Trie. http://article.gmane.org/gmane.comp.lang.haskell.libraries/2571 Find attached a modified version (that I never tested, though). Christian

On Feb 28, 2006, at 8:33 AM, Christian Maeder wrote:
You may compare your code with Keith's implemenation of a Trie. http://article.gmane.org/gmane.comp.lang.haskell.libraries/2571
Thanks for the pointer, I searched for "Prefix Tree" which is an alternative name for trie so I didn't find that implementation. Perhaps, as you suggest in your code, it's time for Data.Trie. -------------------------------- David F. Place mailto:d@vidplace.com
participants (16)
-
Ben Rudiak-Gould
-
Benjamin Franksen
-
Brian Hulley
-
Cale Gibbard
-
Christian Maeder
-
Daniel Fischer
-
David F. Place
-
David F.Place
-
Duncan Coutts
-
Jared Updike
-
Ketil Malde
-
Lennart Augustsson
-
Malcolm Wallace
-
Philippa Cowderoy
-
Robert Dockins
-
Steve Schafer