
I've done some further investigating. The large differences I was seeing on OSX went away when I moved to a linux machine, and also when I compiled the benchmark with -O3. (When I do either of those things, the puzzling difference between isSpace_DataChar and Data.Char.isSpace also goes away.) I don't know enough about core to figure out what is going on here, but I'll assume that the benchmarks I'm getting on the linux box are the sober ones. They look like this (the number is the ratio new code time / old code time for isSpace): Benchmark compiled without optimization: ascii text 0.71 ascii text (short lines) 0.74 ascii text (long lines) 0.72 Greek text 1.08 Haskell code 0.77 chars 0..255 0.70 all spaces 1.02 Benchmark compiled with -O2: ascii text 0.69 ascii text (short lines) 0.72 ascii text (long lines) 0.69 Greek text 1.11 Haskell code 0.77 chars 0..255 0.69 all spaces 0.94 This suggests that we can get a modest improvement for the most common cases if we adopt the new definition of isSpace. However, performance might actually decrease slightly for non-latin text. The changes I tried for other functions in GHC.Unicode did not result in significant improvements. So, the question is whether it's worth submitting the patch for isSpace, given that the gains are more modest than I'd reported before. (Note that 'words' will also be affected by this, as it uses isSpace.) I have attached the proposed patch to this email. John +++ John MacFarlane [Oct 29 12 16:15 ]:
+++ Simon Peyton-Jones [Oct 29 12 22:29 ]:
Sounds good to me. Thanks for doing this.
When you think you are ready, just submit a patch. (As others have noted, maybe isSpace isn't the only function that could benefit from this kind of attention.)
Yes, if the general idea is agreeable, I'll do some of the other functions in GHC.Unicode as well, and provide benchmarks for them as well.
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries