
Daniel Fischer wrote:
Using -O2, GHC produces the same core with or without strictness annotations, the 'acc' parameter of f is an unboxed Int, so the optimiser indeed sees it is needed. The Integer parameter unfortunately can't be unboxed. A native 64-bit integer type might gain something.
Right. For this puzzle (n up to 1000000) it's quite safe to say an Int64 will never overflow. :-) I was wondering how the Integer type is implemented. If I were to implement such a type my first approach would be to keep a list of digits (and perform calculation like I do by hand). But operations on Integers are very fast, even on very big numbers, so I guess that's not the approach taken by the GHC developers. (For example, calculating the faculties of [1..1000] takes a little under 300ms to perform on my Intel PC; if I don't pipe to /dev/null it takes about 10s because of all the digits being printed.) Having a type like Integer on board makes many of the puzzles on Project Euler a shoo-in. One of the questions is indeed to calculate the last n digits of 1000!.
Using a Map, though easier to code, doesn't buy you much here, indeed less than a few easy considerations where the maximal length cannot occur. The problem is that you have to update the Map often, which is rather costly, also the Map takes a lot of memory. Using an array is more convoluted, but much faster (if done right).
The problem with an array is finding a good upper limit. One could of course copy the whole array if a new upper bound is found, the costs would amortize over the whole run. Having a really big and sparse array is no good either (no/bad locality of reference). Oh well, the puzzle itself definitely isn't worth the effort, but maybe some custom made heap structure would perform best.
At least, it would take far longer to get the solution.
Yes, but sometimes the difference is several days vs. a couple of minutes.
Nowadays, the native code generator is not necessarily slower than -fvia-C, often faster. So the best should be to test both, -O2 and -O2 -fvia-C -optc-On, for some medium sized inputs and see which actually is faster.
Interesting. In which release has the back-end been reworked? Does it work equally well for architectures other than Intel? Niels