
Am Montag, 11. September 2006 17:44 schrieben Sie:
Daniel Fischer wrote:
Try Don Stewart's ByteString library
-- and some data that made the programme segfault before now run in a couple of seconds.
But that shouldn't happen! Segfaults aren't performance problems, but errors. So your program contained a bug (lazyness where it isn't useful, I suspect), and ByString is hiding it.
Maybe I've misused the word segfault. What happened is: The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term? That is probably due to (apart from the stupidity of my IO-code) the large overhead of Haskell lists. So the chunk of the file which easily fits into my RAM in ByteString form is too large as a list of ordinary Strings. However the problem might be too little lazyness, because if I explicitly read the file line by line, memory usage remains low enough -- but ByteString is *much* faster anyway.
So even if we just counted newlines, we would have to scan 1,700 million (1.7*10^9) chars per second. Could any ordinary computer of today do that, using whatever language?
That rate should be no problem for a 2GHz machine. However, a 200MHz 64 bit wide bus won't deliver the data fast enough and it is 50x as much as the best hard disks could deliver in optimal circumstances. I guess, most of the test cases are a lot smaller than your guesstimate.
I suppose so, too, but not knowing the test data, I assumed a bad case (uniform distribution of line-lengths and test-case-size in the specified range).
Udo.
Bulat:
are you mean arithmetic or geometric average? ;) I meant 'expected value'. If X_i are independent random variables uniformly distributed on [0 .. k], Y is a random variable (independent from the X_i) uniformly distributed on [1 .. n] and Z is the sum of the first Y of the X_i, then the expected value of Z is (n+1)*k/4. So we might call that a weighted arithmetic average, I suppose.
Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton