
I propose that Haskell's layout rule be changed in the following simple way: * We identify a set of "layout-unsafe" Unicode characters which may occupy something other than one column in some fixed-width fonts. This would include (among other things) combining characters and full-width CJK characters. Explicit Unicode escape sequences, if any, should also count as layout-unsafe. Anything doubtful should be layout-unsafe. * A special unknown-column value is added to the set of possible column positions. * All characters following any layout-unsafe character on a source line are taken to be at position unknown-column. * Any time a layout decision requires comparing two column positions and one or both of them is unknown-column, the lexer will abort with a helpful error message. If TAB is treated as layout-unsafe (as it should be) then this rule change will break some existing code, but only code that deserves to be broken. If TAB is treated specially as it currently is, this change should not break any existing code. More importantly, the change is safe in the sense that any program which is correct under the new rule has the same meaning as it did under the old rule. This is true regardless of what characters end up in the set layout-unsafe. -- Ben