Indeed. Type annotations, inlining, and writing monomorphic functions are quite handy for making tight inner loops.
As are making sure there are no bad branch predictors, that memory locality quality is superb, and that if possible SIMD instructions, and cache hierarchy aware parallelism.
At the same time, these are not AND should not be the concerns of normal end user codes. Tuning tight inner loops is a nontrivial task engineering task in any language, providing generic apis by default does not prevent writing monomorphic versions as needed.
Unless you have an example where for normal application code there will be a significant constant factor difference in performance with the proposed changes, i don't understand your contention :)
Also, for those sorts of problems as you point at, what is an example that can't be resolved with the inlinable + specialize pragmas, that wouldn't otherwise be a problem anyways?
Anyways, I do care about performance, but small constant factors are cheaper to fix if need be than are the constant factors in better base api. Anyone who's privy to my work over the past few months know I quite literally spend all my time thinking about memory locality :)
cheers
-Carter