
Hi, prompted by a discussion about http://www.spoj.pl/problems/MUL/ on #haskell I implemented some faster routines for reading integers, and managed to produce a program that passes this problem. Currently I have a single module that provides reading operations for Integers and Ints. I'm not quite sure what to do with it. Ideally, read should use faster code for parsing numbers. This is hard to do though, because read for numbers is specified in terms of lex, which handles arbitrary tokens. One nasty side effect of this, besides bad performance, is that for example read ('"':reapeat 'a') :: Int diverges, even though there's obviously no possible parse. Incorporating faster integer reading code into read without breaking backward compatibility is not trivial. The next obvious place to put the functions is the Numeric module. This sounds like a good idea until one realizes that readSigned (which is the obvious thing to use for signed integers) again uses lex, with all the complications above. A third option is to put it all in a separate module. This would probably be hard to find, so few people would use it. I'm pondering breaking compatibility with Haskell 98 to some extent and implementing read instances for Int and Integer that don't diverge on weird inputs, but handle all valid numbers that Haskell 98 supports. This still requires some work to support octal and hexadecimal numbers, whitespace and parentheses. What do you think? The code (quite unpolished) is available with darcs get http://int-e.home.tlink.de/haskell/readinteger (browsing directly doesn't work) It also includes a simple benchmark for comparing reads, Numeric.readDec and my readInteger functions. Results for my computer are included as well. regards, Bertram

Hi Bertram,
Currently I have a single module that provides reading operations for Integers and Ints. I'm not quite sure what to do with it.
Get it into base! Where it is, or what its called is less relevant - perhaps entirely decoupled from the Read class, and I wouldn't have thought reading octal integers etc was useful - just simple bog standard integers. I'd really like just readInt/readInteger in Numeric, or Prelude, or possibly even a new Read style module.
I'm pondering breaking compatibility with Haskell 98 to some extent and implementing read instances for Int and Integer that don't diverge on weird inputs, but handle all valid numbers that Haskell 98 supports.
Thats nice, but a stand alone readInt that is not tied to reads, and the whole "leftover parse" thing can be dropped - probably for better performance. Of course, making an existing read faster is also good.
It also includes a simple benchmark for comparing reads, Numeric.readDec and my readInteger functions. Results for my computer are included as well.
Any chance of a very quick summary of the results? i.e. on average x% faster, but y% faster for integers over size z? Saves everyone darcs's in. As far as I am concerned, the code is a black box, but the statistics are something everyone will want. Thanks Neil

Neil Mitchell wrote:
Currently I have a single module that provides reading operations for Integers and Ints. I'm not quite sure what to do with it.
Get it into base! Where it is, or what its called is less relevant -
Ok, one vote for base.
I'm pondering breaking compatibility with Haskell 98 to some extent and implementing read instances for Int and Integer that don't diverge on weird inputs, but handle all valid numbers that Haskell 98 supports.
Thats nice, but a stand alone readInt that is not tied to reads, and the whole "leftover parse" thing can be dropped - probably for better performance.
Performance won't be much better I think. A first attempt at this was actually slower (I have not yet investigated why), and the functions are more useful with leftover parsing, in my opinion (in the MUL problem it saved a call to words - which bought a little bit of extra performance).
Of course, making an existing read faster is also good.
I think it's very desirable because that's what people use most often.
Any chance of a very quick summary of the results? i.e. on average x% faster, but y% faster for integers over size z? Saves everyone darcs's in. As far as I am concerned, the code is a black box, but the statistics are something everyone will want.
For small integers (absolute value less than 1000), Numeric.readDec is 5 to 10 times faster than reads and my readInteger is 5 to 8 times faster than readDec. For medium integers (about 100 digits), readDec and reads are about the same speed while readInteger is about 10 times faster. For larger integers the speedup becomes larger, too. At 100k digits, it reaches 100 (timed using ghci - but hopefully the number is correct anyway.) All measurements were done with ghc 6.5, compiling with -O. regards, Bertram

Hello Neil, Thursday, September 7, 2006, 1:45:03 AM, you wrote:
Currently I have a single module that provides reading operations for Integers and Ints. I'm not quite sure what to do with it.
Get it into base! Where it is, or what its called is less relevant - perhaps entirely decoupled from the Read class, and I wouldn't have thought reading octal integers etc was useful - just simple bog standard integers. I'd really like just readInt/readInteger in Numeric, or Prelude, or possibly even a new Read style module.
i'd just defined the following function, mainly because standard 'read' machinery occupies whole 50 kb in exe-file readI = foldl f 0 where f m c | isDigit c = fromIntegral (ord c - ord '0') + (m * 10) | otherwise = error ("Non-digit "++[c]++" in readI") -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

What about negative numbers? Also, don't use (ord c - ord '0'), it doesn't work with Unicode digits. That's why there is a digitToInt function. -- Lennart On Sep 7, 2006, at 02:12 , Bulat Ziganshin wrote:
readI = foldl f 0 where f m c | isDigit c = fromIntegral (ord c - ord '0') + (m * 10) | otherwise = error ("Non-digit "++[c]++" in readI")

Lennart Augustsson wrote:
Also, don't use (ord c - ord '0'), it doesn't work with Unicode digits. That's why there is a digitToInt function.
FWIW, in Haskell 98, isDigit and digitToInt are defined as isDigit c = c >= '0' && c <= '9' digitToInt :: Char -> Int digitToInt c | isDigit c = fromEnum c - fromEnum '0' | c >= 'a' && c <= 'f' = fromEnum c - fromEnum 'a' + 10 | c >= 'A' && c <= 'F' = fromEnum c - fromEnum 'A' + 10 | otherwise = error "Char.digitToInt: not a digit" making this a bit of a red herring. I can't comment on why Bulat doesn't like negative numbers. Bertram

Hello Bertram, Thursday, September 7, 2006, 6:01:17 PM, you wrote:
Also, don't use (ord c - ord '0'), it doesn't work with Unicode digits. That's why there is a digitToInt function.
making this a bit of a red herring. I can't comment on why Bulat doesn't like negative numbers.
because my program can't split files into chunks of -10 files or -1 mbytes ;) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bertram Felgenhauer wrote:
I can't comment on why Bulat doesn't like negative numbers.
Neither it seems, did the original Haskell committee - that's why we have to muddle along with a confusing unary minus operator instead of proper negative literals - see the thread beginning http://www.haskell.org/pipermail/haskell-cafe/2006-August/017410.html for the latest exciting attempt to get rid of unary minus... ;-) Regards, Brian. -- "Man simply could not have the right interest in beauty if in his life of soul he were not entangled in a world of hideously ugly spider-like beings." - Rudolf Steiner 16/12/1922

Hi, I've implemented replacements for readsPrec for Int and Integer in a module called Compat (in my darcs repository [1]) and measured their performance. There's some overhead when compared to the plain versions. For Integer, it's about 30% for single digit numbers, going down to about 10% at 10 digits, and about 1.3% at 100. It's still a big win over the existing instances at a small loss in compatibility to Haskell 98. As far as I can make out, the only difference is that it fails to diverge in some cases where reads would diverge (see my initial mail for an example.) I've also received a question from Ketil Malde off list, asking (in reply to Neil Mitchell) what the advantages would be of having this in base. Personally I think it should go into base if we can agree on replacing existing code (the Read instances or something in Numeric or maybe both). This would have the advantage of making quite a few programs more efficient just by recompiling them. If we can't agree on that, the code will be made available as an extra module but I wouldn't see any advantage at having it in base. Having it distributed with ghc would have some advantages (mainly convenience for users and a slightly better chance of avoiding forks). In other news, darcs send will now send the patches to me - in case anyone has any. regards, Bertram [1] I'll say again that browsing directly doesn't work, darcs is fine though, get the code with darcs get http://int-e.home.tlink.de/haskell/readinteger

Hi, I've commited some small improvements to the readInteger code, which speed up handling negative numbers and also the compatability wrappers very slightly. My last numbers comparing the reads wrappers to the basic readInteger code turned out to be unfair; apparently ghc managed to share some of the calculations of these functions somehow. I've changed the benchmark code to wrap the new functions into NOINLINE helpers, so that source of errors is eliminated. Now the reads overhead looks much less scary then before, on the order of 10% for small integers. regards, Bertram
participants (5)
-
Bertram Felgenhauer
-
Brian Hulley
-
Bulat Ziganshin
-
Lennart Augustsson
-
Neil Mitchell