Haskell's non-strict evaluation can often lead to unexpected results when doing tail recursion if you're used to strict functional programming languages. In order to get the desired behavior you will need to force the accumulator (with something like Data.List's foldl', $!, seq, BangPatterns, etc.).
One issue with the tail recursive version is that you're basically sequencing it twice, forcing the result is going to force the whole spine of the list, and then you're walking it again with deepseq. The overhead of the deepseq call with that size list on my machine is 2.2 seconds, so you're basically paying that cost twice with a tail recursive version. allPairs2 only forces what is needed, so deepseq isn't traversing any data structures that are already forced.
I think what's happening with allPairs2 is that GHC is throwing away the the list during the traversal since the result of the computation isn't used at all. I get very different timings (still fast, but no longer orders of magnitude difference) for allPairs2 if the result is still around. You could probably figure this out with the heap profiler.
h> deepseq (allPairs2 [1..10000]) True
True
(2.47 secs, 4403724176 bytes)
h> let x = allPairs2 [1..10000]
(0.00 secs, 510488 bytes)
h> deepseq x True
True
(10.47 secs, 4401625192 bytes)
h> deepseq x True
True
(2.21 secs, 1031864 bytes)
I'm no expert, but I think that allPairs2 is likely as good as you're going to get with straightforward Haskell using lists. It doesn't build up a lot of unevaluated computation, and if you only need to see the first n elements it's not going to have to process the entire input. allPairs1 has most of the good properties of allPairs2 but is a bit worse because of the concatenation, though concatenating in this way isn't a disaster like it would be in a strict functional programming language.
-bob