Re: [Haskell-cafe] Slow IO

12 Sep 2006

      Am Montag, 11. September 2006 17:44 schrieben Sie:
...
Daniel Fischer wrote:
...
...
Try Don Stewart's ByteString library
-- and some data that made the programme segfault before now run in a
couple of seconds.
But that shouldn't happen!  Segfaults aren't performance problems, but
errors.  So your program contained a bug (lazyness where it isn't
useful, I suspect), and ByString is hiding it.
Maybe I've misused the word segfault.
What happened is:
The programme consumed more and more memory (according to top),
kswapd started to have a higher CPU-percentage than my programme,
programme died, system yelling 'Speicherzugriffsfehler', top displays 
'kswapd<defunct>'.
I believe that means my programme demanded more memory than I have available 
(only 256MB RAM + 800MB swap). Is that a segfault or what is the correct 
term?

That is probably due to (apart from the stupidity of my IO-code) the large 
overhead of Haskell lists. So the chunk of the file which easily fits into my 
RAM in ByteString form is too large as a list of ordinary Strings.
However the problem might be too little lazyness, because if I explicitly read 
the file line by line, memory usage remains low enough -- but ByteString is 
*much* faster anyway.
...
...
So even if we just counted newlines, we would have to scan 1,700 million
(1.7*10^9) chars per second.
Could any ordinary computer of today do that, using whatever language?
That rate should be no problem for a 2GHz machine.  However, a 200MHz 64
bit wide bus won't deliver the data fast enough and it is 50x as much as
the best hard disks could deliver in optimal circumstances.  I guess,
most of the test cases are a lot smaller than your guesstimate.
I suppose so, too, but not knowing the test data, I assumed a bad case 
(uniform distribution of line-lengths and test-case-size in the specified 
range).
...
Udo.
Bulat:
...
are you mean arithmetic or geometric average? ;)
I meant 'expected value'.
If X_i are independent random variables uniformly distributed on [0 .. k],
Y is a random variable (independent from the X_i) uniformly distributed on
[1 .. n] and Z is the sum of the first Y of the X_i, then the expected value 
of Z is (n+1)*k/4.
So we might call that a weighted arithmetic average, I suppose.
Cheers,
Daniel

-- 

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
	-- Blair P. Houghton