
On Wed, Feb 23, 2011 at 1:55 PM, Max Bolingbroke
Thanks for bringing some data to the table. There are definitely some common patterns in what you sent me:
1) For defining Binary instances, you need to write set size before you write the elements: ~7 occurrences 2) Tests against small constants (typically <= 0 or 1, but also 2 and 3!): ~15 occurrences 3) A surprise to me: generating fresh names! People keep a set of all names generated so far, and then just take size+1 as their fresh name. Nice trick. ~17 occurrences 4) Turning sizes into strings for human consumption: ~19 occurrences 5) Just reexporting the functions somehow. Uninformative. ~8 occurrences
There were ~38 occurrences over ~13 repos where it appeared to be somehow fundamental to an algorithm (I didn't look into most of these in detail). I've put those after my message.
Frankly I am surprised how much "size" gets used. It seems that making it fast is more important than I thought.
Nice analysis. Does this apply to maps as well as sets or are sets use differently than maps somehow? IntMap (which shares data structure with HashMap) only hash O(n) size. I wonder if people avoid using IntMap because of this. I wonder if there's a way to implement size that doesn't mess up the code so badly (see the commit on GitHub to see how badly it messes up e.g. insert). Johan