RE: Test performance impact (was: The dreaded M-R)

On 02 February 2006 09:52, John Hughes wrote:
Summary: 2 programs failed to compile due to type errors (anna, gg). One program did 19% more allocation, a few other programs increased allocation very slightly (<2%).
pic +0.28% +19.27% 0.02
Thanks, that was interesting. A follow-up question: pic has a space bug. How long will it take you to find and fix it?
I just tried this, and it took me just a few minutes. Compiling both versions with profiling, for the original: total time = 0.00 secs (0 ticks @ 20 ms) total alloc = 11,200,656 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc chargeDensity ChargeDensity 0.0 2.5 accumCharge ChargeDensity 0.0 13.5 relax Potential 0.0 31.4 correct Potential 0.0 5.0 genRand Utils 0.0 1.0 fineMesh Utils 0.0 2.4 applyOpToMesh Utils 0.0 12.7 =: Utils 0.0 2.3 pushParticle PushParticle 0.0 16.1 timeStep Pic 0.0 11.0 and with the monomorphism restriction turned off: total time = 0.02 secs (1 ticks @ 20 ms) total alloc = 12,893,544 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc pushParticle PushParticle 100.0 20.8 chargeDensity ChargeDensity 0.0 2.2 accumCharge ChargeDensity 0.0 18.0 relax Potential 0.0 27.3 correct Potential 0.0 4.4 fineMesh Utils 0.0 2.1 applyOpToMesh Utils 0.0 11.1 =: Utils 0.0 2.0 timeStep Pic 0.0 9.5 So, ignoring the %time column (the program didn't run long enough for the profiler to get enough time samples), we can see the following functions increased their allocation as a % of the total: pushParticle, accumCharge Looking at the code for accumCharge: accumCharge :: [Position] -> [MeshAssoc] accumCharge [] = [] accumCharge ((x,y):xys) = [((i ,j ) , charge * (1-dx) * (1-dy))] ++ [((i',j ) , charge * dx * (1-dy))] ++ [((i ,j') , charge * (1-dx) * dy)] ++ [((i',j') , charge * dx * dy)] ++ accumCharge xys where i = truncate x i' = (i+1) `rem` nCell j = truncate y j' = (j+1) `rem` nCell dx = x - fromIntegral i dy = y - fromIntegral j Now, because I know what I'm looking for, I can pretty quickly spot the problem. I had to look at the definition of MeshAssoc to figure out that the result type of this function forces i to have type Int, yet it is used elsewhere as the argument to fromIntegral, where if i is overloaded will be defaulted to Integer. When I give type signatures to i and j (:: Int), the allocation reduces. The pushParticle function has an identical pattern. Fixing these two functions brought the performance back to the original. But I've also changed the semantics - the author might have *wanted* i at type Integer in the definition of dx to avoid overflow, and the monomorphism restriction had prevented it. I suppose you could ask how you'd find the problem if you didn't know what to look for. So I added some more annotations: i = {-# SCC "i" #-} truncate x i' = {-# SCC "i'" #-} (i+1) `rem` nCell j = {-# SCC "j" #-} truncate y j' = {-# SCC "j'" #-} (j+1) `rem` nCell dx = {-# SCC "dx" #-} x - fromIntegral i dy = {-# SCC "dy" #-} y - fromIntegral j and the profiling output shows: i ChargeDensity 100.0 6.8 j ChargeDensity 0.0 6.8 chargeDensity ChargeDensity 0.0 2.2 accumCharge ChargeDensity 0.0 3.9 relax Potential 0.0 27.2 ... So this pretty clearly identifies the problem area (although the figures don't quite add up, I suspect the insertion of the annotations has affected optimisation in some way). Still, you could argue that it doesn't actually tell you the cause of the problem: namely that i&j are being evaluated twice as often as you might expect by looking at the code. This is what the compiler warning would do, and I completely agree that not having this property evident by looking at the source code is a serious shortcoming.
And how come speed improved slightly in many cases--that seems counter- intuitive.
The runtimes are unreliable, due to the short runnning time of most of these benchmarks. We have a "slow" mode for the benchmark suite that runs each program with larger test data, but I didn't use it this time - mostly we find that measuring allocations is useful as a first approximation, and it's certainly more reliable. (rest of email snipped, most of which I agree with). Cheers, Simon

On Thu, Feb 02, 2006 at 12:34:30PM -0000, Simon Marlow wrote:
Still, you could argue that it doesn't actually tell you the cause of the problem: namely that i&j are being evaluated twice as often as you might expect by looking at the code.
Would not the "entries" count in the profile tip you off to this? The entries for i should be twice that for accumCharge, right? Andrew
participants (2)
-
Andrew Pimlott
-
Simon Marlow