
I think it's predictable, isSpace (which words is based on) is based on generalCategory, which returns the proper Unicode category: λ> generalCategory '\xa0' Space I agree, and I also agree that it would make sense the other way (not breaking on non-breaking spaces). Perhaps it would be a good idea to add a remark to the documentation which specifies the treatment of non-breaking spaces. I note that Java has two distinct properties concerning whitespace: Character.isSpaceChar('\xA0') == True Character.isWhitespace('\xA0') == False Contrast with -- \x20 is ASCII space Character.isSpaceChar('\x20') == True Character.isWhitespace('\x20') == True -- \x2060 is the word-joiner (zero-width non-breaking space) Character.isSpaceChar('\x2060') == False Character.isWhitespace('\x2060') == False -- \x202F is the narrow non-breaking space Character.isSpaceChar('\x202F') == True Character.isWhitespace('\x202F') == False -- \x2009 is the thin space Character.isSpaceChar('\x2009') == True CharacterisWhitespace('\x2009') == True