Without optimization, each add1 adds about 0.37 seconds. With optimization, each add1 adds about 0.16 seconds. That's over twice as fast! Of course, this is very much a "lab environment".
I suspect you might be able to make the rewrite rules better with a few tricks, but I haven't gotten around to testing them yet.