
#10412: isAlphaNum includes mark characters, but neither isAlpha nor isNumber do -------------------------------------+------------------------------------- Reporter: Artyom.Kazak | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.10.1 Resolution: | Keywords: unicode, | newcomer Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Azel): Looking a bit farther afield, all languages I see who have an `isAlphaNum` equivalent define it as returning `True` if either of their `isAlpha` or `isNumber` equivalents do (e.g. [https://docs.oracle.com/javase/9/docs/api/java/lang/Character.html #isLetterOrDigit-int- Java's], [http://msdn.microsoft.com/en- gb/library/cay4xx2f(v=vs.110).aspx the .NET Framework's], [http://www.lispworks.com/documentation/HyperSpec/Body/13_ade.htm Common Lisp's], [https://docs.python.org/3/library/stdtypes.html#str.isalnum Python's] — with the particularity in Python's documentation that they put three functions to match on numbers in `isalnum`'s description but the first two are subsumed by the third… — or [http://www.ada- auth.org/standards/12rm/html/RM-A-3-5.html Ada's]). So I'm willing to have a go at solving that ticket and would be in favour of fixing `u_iswalnum` and keeping the doc mostly as it is: it states that `isAlphaNum` selects alphabetic or numeric digit Unicode characters and currently, even if we remove the mark characters, it doesn't matches only that because it matches also `GENCAT_NO` and `GENCAT_NL`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10412#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler