
Hello! I saw a question on StackOverflow about the difference between isAlpha and isLetter today. One of the answers stated that the two functions are interchangeable, even though they are implemented differently. I decided to find out whether the difference in implementation influences performance, and look what I found:
import Criterion.Main import Data.Char fTest name f list = bgroup name $ map (\(n,c) -> bench n $ whnf f c) list tests = [("latin", 'e'), ("digit", '8'), ("symbol", '…'), ("greek", 'λ')] main = defaultMain [fTest "isAlpha" isAlpha tests, fTest "isLetter" isLetter tests]
produces this table (times are in nanoseconds): latin digit symbol greek ----- ----- ------ ----- isAlpha | 156 212 368 310 isLetter | 349 344 383 310 isAlpha is twice as fast on latin inputs! Does it mean that isAlpha should be preferred? Why isn’t isLetter defined in terms of isAlpha in Data.Char?

On 11/21/12 4:59 PM, Artyom Kazak wrote:
I saw a question on StackOverflow about the difference between isAlpha and isLetter today. One of the answers stated that the two functions are interchangeable, even though they are implemented differently.
I decided to find out whether the difference in implementation influences performance, and look what I found:
import Criterion.Main import Data.Char fTest name f list = bgroup name $ map (\(n,c) -> bench n $ whnf f c) list tests = [("latin", 'e'), ("digit", '8'), ("symbol", '…'), ("greek", 'λ')] main = defaultMain [fTest "isAlpha" isAlpha tests, fTest "isLetter" isLetter tests]
produces this table (times are in nanoseconds):
latin digit symbol greek ----- ----- ------ ----- isAlpha | 156 212 368 310 isLetter | 349 344 383 310
isAlpha is twice as fast on latin inputs! Does it mean that isAlpha should be preferred? Why isn’t isLetter defined in terms of isAlpha in Data.Char?
FWIW, testing on an arbitrary snippit of Japanese yields: benchmarking nf (map isAlpha) mean: 26.21897 us, lb 26.17674 us, ub 26.27707 us, ci 0.950 std dev: 251.4027 ns, lb 200.4399 ns, ub 335.3004 ns, ci 0.950 benchmarking nf (map isLetter) mean: 26.95068 us, lb 26.91681 us, ub 26.99481 us, ci 0.950 std dev: 197.5631 ns, lb 158.9950 ns, ub 239.4986 ns, ci 0.950 I'm curious what the difference is between the functions, and whether isLetter is ever preferable... -- Live well, ~wren
participants (2)
-
Artyom Kazak
-
wren ng thornton