
I think I might have found why (or partially why) ghc is so slow on x86-64.. section 5.10 of the optimization manual http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/2511... (which has a whole lot of good info for any processor, including a whole chapter on how to write C code that optimizes well independent of the CPU) "don't place code and data on the same cache line" it will cast out the code line from the cache on acces to the data and vice versa. so basically, ghc is running L1 cacheless on the x86-64 if I understand things properly. (maybe for other CPUs too, we might want to check the intel optimization manuals too) If it is too difficult to separate the code and data from each other (which it might be, since ghc goes through specific measures to put them next to each other) then making sure the transition from code to data occurs exactly on a 64 byte cache line boundry might solve this issue. it would mean that each function takes up a minimum of 128 bytes and we can't have more than one per cache line.. but perhaps that is an acceptable tradeoff, but we might want to inline more to get bigger functions so we don't have to pad so much. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (1)
-
John Meacham