Re: [Haskell-cafe] [perl #129843] [LTA] Indexing on a Str throws generic “out of range” message which is less than awesome (“hello”[2])

Am 08.05.2017 um 23:00 schrieb Brandon Allbery:
On Mon, May 8, 2017 at 4:49 PM, Joachim Durchholz
mailto:jo@durchholz.org> wrote: If the mental model for Perl6 strings is "array of characters" though
Perl has never had that mental model, is my point.
Right, I should have written "is supposed to evolve to" instead of "is". Array of characters may be a useful abstraction to have in Perl6, or not (see below).
It's generally imported by folks who come from languages where strings *are* "arrays of characters" --- and where that model has a strong tendency to cause problems. (See Python 3's struggles with Unicode as an example. And C/C++, well, don't even get me started.
Some of these struggles originate from equating bytes with characters. Since Perl6 is more or less a clean slate, it can avoid these. Other struggles originate from the structure of Unicode: it defines multiple levels of sequences, each useful for different tasks: - code points - graphemes - characters (various normalizations exist) - word parts (for line breaking) - words - sentences - paragraphs and possible a few more. Ideally, developers will be able to use the same API structure at each level, maybe with the exception of the grapeme level where Perl6 has its native representation (the better the API, the less of such implementation details is visible and relevant to the programmer).
Bytes stopped being the basis of characters even *before* Unicode. C and C++ are still struggling to understand that.
I think you're being unfair to them. The issues are actually well-understood in the C++ arena, as demonstrated by the ICU library. It's just that language evolution is constrained by legacy, plus possibly short-sighed decisions by compiler makers. Also, C++ (by necessity) evolves slower than Unicode. Under these conditions, Unicode support in a library is actually preferrable to anything inside the language, it's enough if the language can interoperate with the library.
participants (1)
-
Joachim Durchholz