On 28 March 2011 17:55, malcolm.wallace <malcolm.wallace@me.com> wrote:
Does anyone else think it odd that Prelude.words will break a string at a non-breaking space?

Prelude> words "abc def\xA0ghi"
["abc","def","ghi"]

I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category:

λ> generalCategory '\xa0'
Space

So:

-- | Selects white-space characters in the Latin-1 range.
-- (In Unicode terms, this includes spaces and some control characters.)
isSpace                 :: Char -> Bool
-- isSpace includes non-breaking space
-- Done with explicit equalities both for efficiency, and to avoid a tiresome
-- recursion with GHC.List elem
isSpace c               =  c == ' '     ||
                           c == '\t'    ||
                           c == '\n'    ||
                           c == '\r'    ||
                           c == '\f'    ||
                           c == '\v'    ||
                           c == '\xa0'  ||
                           iswspace (fromIntegral (ord c)) /= 0