
On Fri, Feb 19, 2021 at 09:03:44PM -0500, Viktor Dukhovni wrote:
On Fri, Feb 19, 2021 at 06:05:12PM -0700, amindfv--- via Haskell-Cafe wrote:
Does there exist a Haskell library or function for getting grapheme lengths of String/Text values?
Depends on your definition of "grapheme length" :-) If you're OK with counting NFC code points, then the answer is yes, via the "text-icu" package.
$ cabal repl -z -v0 \ --repl-options "-package=text-icu" \ --repl-options "-package=text" \ --repl-options -XOverloadedStrings λ> import qualified Data.Text as T λ> import Data.Text.ICU.Normalize λ> length $ T.unpack $ normalize NFC "ä" 1 λ> length $ T.unpack $ normalize NFD "ä" 2 λ> length $ T.unpack $ normalize NFC $ normalize NFD "ä" 1
Thanks. Unfortunately this doesn't work well for graphemes which don't have a 1-code-point equivalent, like: length $ T.unpack $ normalize NFC $ normalize NFD "❤️" == 2
With the "Data.Text.ICU.Char" module, it may be possible to determine grapheme boundaries:
https://hackage.haskell.org/package/text-icu-0.7.0.1/docs/Data-Text-ICU-Char...
I'll look into this and report back. Tom