
Hi Rob, Am Dienstag, den 30.08.2016, 16:57 +0100 schrieb Rob Stewart:
Thanks for the reply, and really interesting work!
In the following measurements, I avoid this problem by not measuring program execution time, but simply by counting the number of instructions performed.
Are these a count of all instructions performed at runtime? Did you isolate a count of just the memory access instructions that ended up fetching from main memory? Also, did you measure the clock cycle latency averages for memory access instructions for (B) (C) and (D), to get an indication of cache misses?
I did not dig deeper. I just ran the test suite under valgrind (nofib has support for that) and took the number of instructions that came out. My working hypothesis was that the effect of high level (i.e. Core) transformation on code layout / branch prediction / pipeline stuff / cache hits and misses are too hard to predict and such measurements would not give me useful information. But you seem to be after a more optimistic and principled approach here, so I’ll be keen to hear what you find out. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • https://www.joachim-breitner.de/ XMPP: nomeata@joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org