After playing a bit with Ryan's benchmark, I no longer think that the
order matters much for the total number of allocations. Nor do I believe
in first-class vs non-first-class IO actions. All that should matter is
how many allocations we can move to the stack. But I haven't yet figured
out why exactly different versions differ so drastically in this regard.