
2009/08/31 Don Stewart
If you can abstract out a common function for lexing ints out of bytestrings, we could add it to the bytestring-lexing package.
All the really performant implementations operate on strings with multiple ints in them; I suspect this reduces memory traffic -- and indeed, Eugene's code using my libs allocates about twice as much memory as Don's code. I've tried a few different things with strictness annotations to no avail. I'm having some trouble understanding the meaning of "entries" in the profiler's output. I have a file with 5 million random integers in it, totalling 26210408 bytes (21210408 bytes of which are not newlines). The relevant part is here: COST CENTRE MODULE entries MAIN MAIN 0 main Main 1 bint Main 5000001 lazy_int Data.ByteString.Nums.Careless.Int 41211385 digitize Data.ByteString.Nums.Careless.Int 21210408 The number of "entries" to `lazy_int` is puzzling. Eugene's `bint` is called for each line of the file -- once for the header and then 5 million times for each of the integers. (There are two numbers on the first line but Eugene's program only uses `k` so `bint` is only actually entered once.) However, `bint` just calls my `int` and `int` calls `lazy_int` so why are there 41 million plus "entries" of `lazy_int`? -- Jason Dusek