Re[2]: Comment Syntax

3 Feb 2006

      Hello John,

Friday, February 03, 2006, 3:39:38 AM, you wrote:
...
...
Got a unicode-compliant compiler?
...
Log:
  Add support for UTF-8 source files
GHC finally has support for full Unicode in source files.  Source
  files are now assumed to be UTF-8 encoded, and the full range of
  Unicode characters can be used, with classifications recognised
using
  the implementation from Data.Char.  This incedentally means that
only
  the stage2 compiler will recognise Unicode in source files, because
I
  was too lazy to port the unicode classifier code into libcompat.
Additionally, the following synonyms for keywords are now
recognised:
forall symbol     (U+2200)        forall
    right arrow       (U+2192)        ->
    left arrow                (U+2190)        <-
    horizontal ellipsis       (U+22EF)        ..
there are probably more things we could add here.
This will break some source files if Latin-1 characters are being
used.
  In most cases this should result in a UTF-8 decoding error.  Later
on
  if we want to support more encodings (perhaps with a pragma to
specify
  the encoding), I plan to do it by recoding into UTF-8 before
JM> sure do :)

JM> but it currently doesn't recognize any unicode characters as possible
JM> operators.

are you read this? :)

parsing.
...
Internally, there were some pretty big changes:
- FastStrings are now stored in UTF-8
- Z-encoding has been moved right to the back end.  Previously we
      used to Z-encode every identifier on the way in for simplicity,
      and only decode when we needed to show something to the user.
      Instead, we now keep every string in its UTF-8 encoding, and
      Z-encode right before printing it out.  To avoid Z-encoding the
      same string multiple times, the Z-encoding is cached inside the
      FastString the first time it is requested.
This speeds up the compiler - I've measured some definite
      improvement in parsing at least, and I expect compilations
overall
...
to be faster too.  It also cleans up a lot of cruft from the
      OccName interface.  Z-encoding is nicely hidden inside the
      Outputable instance for Names & OccNames now.
- StringBuffers are UTF-8 too, and are now represented as
      ForeignPtrs.
- I've put together some test cases, not by any means exhaustive,
      but there are some interesting UTF-8 decoding error cases that
      aren't obvious.  Also, take a look at unicode001.hs for a demo.
-- 
Best regards,
 Bulat                            mailto:bulatz@HotPOP.com

Re[2]: Comment Syntax

Bulat Ziganshin