Re: [Haskell-cafe] Slow IO

newer
program execution and laziness

older
Optimization problem

Daniel Fischer

12 Sep 2006 12 Sep '06

4:51 p.m.

Am Montag, 11. September 2006 17:44 schrieben Sie:

...

Daniel Fischer wrote:

...
...
Try Don Stewart's ByteString library

-- and some data that made the programme segfault before now run in a couple of seconds.

But that shouldn't happen! Segfaults aren't performance problems, but errors. So your program contained a bug (lazyness where it isn't useful, I suspect), and ByString is hiding it.

Maybe I've misused the word segfault. What happened is: The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term? That is probably due to (apart from the stupidity of my IO-code) the large overhead of Haskell lists. So the chunk of the file which easily fits into my RAM in ByteString form is too large as a list of ordinary Strings. However the problem might be too little lazyness, because if I explicitly read the file line by line, memory usage remains low enough -- but ByteString is *much* faster anyway.

...

...
So even if we just counted newlines, we would have to scan 1,700 million (1.7*10^9) chars per second. Could any ordinary computer of today do that, using whatever language?

That rate should be no problem for a 2GHz machine. However, a 200MHz 64 bit wide bus won't deliver the data fast enough and it is 50x as much as the best hard disks could deliver in optimal circumstances. I guess, most of the test cases are a lot smaller than your guesstimate.

I suppose so, too, but not knowing the test data, I assumed a bad case (uniform distribution of line-lengths and test-case-size in the specified range).

...

Udo.

Bulat:

...

are you mean arithmetic or geometric average? ;) I meant 'expected value'. If X_i are independent random variables uniformly distributed on [0 .. k], Y is a random variable (independent from the X_i) uniformly distributed on [1 .. n] and Z is the sum of the first Y of the X_i, then the expected value of Z is (n+1)*k/4. So we might call that a weighted arithmetic average, I suppose.

Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Show replies by date

Jared Updike

12 Sep 12 Sep

8:11 p.m.

New subject: Slow IO

...

Maybe I've misused the word segfault. What happened is: The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term?

I thought a segfault was when a program reads or writes to protected memory (for example, out of bounds on an array, or dereferencing a null pointer, etc. things mostly avoided in Haskell). Luckily there are lots of fun ways to crash computers! I've heard your "heap overflow" called "exploding" because of how the process rapdily expands its memory usage, filling all available space, causing system collapse. Jared. -- http://www.updike.org/~jared/ reverse ")-:"

Udo Stenzel

8:26 p.m.

New subject: Slow IO

Daniel Fischer wrote:

...

The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term?

That is probably due to (apart from the stupidity of my IO-code) the large overhead of Haskell lists.

Most certainly not. I'm pretty sure this is to a bug in your code. Something retains a data structure which is actually unneeded. Probably a case of "foldl" where "foldl'" should be used or a "try" in Parsec code where it should be left out or a lot of "updateWiths" to a Map, etc. Or it could be a bad choice of data structure. I bet, it's the map you're using to represent the graph (which you don't even need to represent at all, btw).

...

So the chunk of the file which easily fits into my RAM in ByteString form is too large as a list of ordinary Strings.

The chunk of file should never need to fit into RAM. If that's a problem, you also forgot to prime a crucial "foldl". Udo. -- "Proof by analogy is fraud." -- Bjarne Stroustrup

Daniel Fischer

13 Sep 13 Sep

12:55 a.m.

New subject: Slow IO

Am Dienstag, 12. September 2006 22:26 schrieben Sie:

...

Daniel Fischer wrote:

...
The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term?

That is probably due to (apart from the stupidity of my IO-code) the large overhead of Haskell lists.

Most certainly not. I'm pretty sure this is to a bug in your code. Something retains a data structure which is actually unneeded. Probably

Apparently. And my money is on a load of lines from the file (of which I need only the first and last Char).

...

a case of "foldl" where "foldl'" should be used or a "try" in Parsec code where it should be left out or a lot of "updateWiths" to a Map, etc. Or it could be a bad choice of data structure. I bet, it's the map you're using to represent the graph (which you don't even need to represent at all, btw).

No foldl nor parsec around. I represent the graph as a UArray (Char,Char) Int (I've switched to Int for the index type, too, when tuning the code), so that shouldn't use much memory (array size is 676). The array is built via accumArray, I hope that's sufficiently efficient (though now I use unsafeAccumArrayUArray, that's faster). How could I solve the problem without representing the graph in some way? Possibly that could be done more efficiently than I do it, but I can't imagine how to do it without representing the graph in some data structure.

...

...
So the chunk of the file which easily fits into my RAM in ByteString form is too large as a list of ordinary Strings.

The chunk of file should never need to fit into RAM. If that's a problem, you also forgot to prime a crucial "foldl".

Forgive the stupid question, but where if not RAM would the chunk currently processed reside?

...

Udo.

Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

dons＠cse.unsw.edu.au

12:58 a.m.

New subject: Slow IO

daniel.is.fischer:

...

Am Dienstag, 12. September 2006 22:26 schrieben Sie:

...
Daniel Fischer wrote:

...
The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term?

That is probably due to (apart from the stupidity of my IO-code) the large overhead of Haskell lists.

Most certainly not. I'm pretty sure this is to a bug in your code. Something retains a data structure which is actually unneeded. Probably

Apparently. And my money is on a load of lines from the file (of which I need only the first and last Char).

...
a case of "foldl" where "foldl'" should be used or a "try" in Parsec code where it should be left out or a lot of "updateWiths" to a Map, etc. Or it could be a bad choice of data structure. I bet, it's the map you're using to represent the graph (which you don't even need to represent at all, btw).

No foldl nor parsec around. I represent the graph as a

UArray (Char,Char) Int

(I've switched to Int for the index type, too, when tuning the code), so that shouldn't use much memory (array size is 676). The array is built via accumArray, I hope that's sufficiently efficient (though now I use unsafeAccumArrayUArray, that's faster).

How could I solve the problem without representing the graph in some way? Possibly that could be done more efficiently than I do it, but I can't imagine how to do it without representing the graph in some data structure.

...
...
So the chunk of the file which easily fits into my RAM in ByteString form is too large as a list of ordinary Strings.

The chunk of file should never need to fit into RAM. If that's a problem, you also forgot to prime a crucial "foldl".

Forgive the stupid question, but where if not RAM would the chunk currently processed reside?

I agree. Some problems simply require you to hold large strings in memory. And for those, [Char] conks out around 5-10M (try reversing a 10M [Char]). -- Don

Udo Stenzel

9:07 a.m.

New subject: Slow IO

Daniel Fischer wrote:

...

...
Most certainly not. I'm pretty sure this is to a bug in your code. Something retains a data structure which is actually unneeded. Probably

Apparently. And my money is on a load of lines from the file (of which I need only the first and last Char).

Then you're doing it wrong[TM]. You shouldn't need to keep any part of the input in memory. Whatever it is, nobody can tell you without seeing the code. Try heap profiling, should you have no idea where to look for leaks.

...

How could I solve the problem without representing the graph in some way?

By using an advanced tool called "brains". Sorry for not being more specific, but that's actually the fun part of the challenge and I'm not going to spoil it for you. ;-)

...

Forgive the stupid question, but where if not RAM would the chunk currently processed reside?

Oh, I overlooked "chunk". Well, yes, the "chunk" currently processed needs to fit into RAM. But how much of a problem could a single Char pose? Donald Bruce Stewart wrote:

...

I agree. Some problems simply require you to hold large strings in memory. And for those, [Char] conks out around 5-10M (try reversing a 10M [Char]).

Sure, this one just isn't of that kind. Udo. -- "Irrationality is the square root of all evil" -- Douglas Hofstadter

Daniel Fischer

2:08 p.m.

New subject: Slow IO

Am Mittwoch, 13. September 2006 11:07 schrieben Sie:

...

Daniel Fischer wrote:

...
...
Most certainly not. I'm pretty sure this is to a bug in your code. Something retains a data structure which is actually unneeded. Probably

Apparently. And my money is on a load of lines from the file (of which I need only the first and last Char).

Then you're doing it wrong[TM]. You shouldn't need to keep any part of

Yes, I did it wrong, but I didn't keep anything (but the first and last Char of each line) in memory on purpose. I hoped for the lines to be read one after the other, head and last extracted - possibly immediately passed to accumArray, but I wouldn't necessarily expect that - and the already used lines thrown in the dustbin on next GC. Maybe the compiler couldn't figure out that it wouldn't access these lines anymore.

...

the input in memory. Whatever it is, nobody can tell you without seeing the code. Try heap profiling, should you have no idea where to look for leaks.

Profiling (hy,hc) shows that the IO part of the programme holds on to tons of lists - that couldn't be anything but parts of the file-contents, I believe.

...

...
How could I solve the problem without representing the graph in some way?

By using an advanced tool called "brains". Sorry for not being more specific, but that's actually the fun part of the challenge and I'm not going to spoil it for you. ;-)

...
Forgive the stupid question, but where if not RAM would the chunk currently processed reside?

Oh, I overlooked "chunk". Well, yes, the "chunk" currently processed needs to fit into RAM. But how much of a problem could a single Char pose?

Well, if it's the last straw... But not much, I presume and even though it might be that we must have a few thousand Chars inmemory, that shouldn't do much harm either.

...

Donald Bruce Stewart wrote:

...
I agree. Some problems simply require you to hold large strings in memory. And for those, [Char] conks out around 5-10M (try reversing a 10M [Char]).

Sure, this one just isn't of that kind.

Yes, but I didn't tell the compiler :-(

...

Udo.

Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton

Ketil Malde

14 Sep 14 Sep

7:34 a.m.

New subject: Slow IO

Daniel Fischer writes:

...

Yes, I did it wrong, but I didn't keep anything (but the first and last Char of each line) in memory on purpose. I hoped for the lines to be read one after the other, head and last extracted [...] Profiling (hy,hc) shows that the IO part of the programme holds on to tons of lists - that couldn't be anything but parts of the file-contents, I believe.

Chances are that you don't evaluate these characters right away, so what you are retaining is just unevaluated thunks that refer to the lines themselves. So the fix is to make the extraction of the characters stricter. -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde

13 Sep 13 Sep

6:41 a.m.

New subject: Slow IO

Daniel Fischer writes:

...

Maybe I've misused the word segfault.

I think so. A segfault is the operating-system complaining about an illegal memory access. If you get them from Haskell, it is likely a bug in the compiler or run-time system (or you were using unsafeAt, or FFI).

...

The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd<defunct>'.

I find that swapping the GHC heap is not productive, so I tend to limit memory to available RAM. You can do that either by limiting available memory to process in the shell (ulimit or limit), or by specifying RTS options to the haskell program (typically +RTS -Mxxx, where xxx is 80-90% of physical RAM).

...

From the GHC docs (6.4.2):

-Msize [Default: unlimited] Set the maximum heap size to size bytes. The I thought the default was set according to limits? heap normally grows and shrinks according to the memory requirements of the program. The only reason for having this option is to stop the heap growing without bound and filling up all the available swap space, which at the least will result in the program being summarily killed by the operating system. In my experience, a program which thrashes severely without heap limits can run fine if you just limit the heap. So it's not as much an issue of 'filling up swap' vs. 'killed by the OS', but 'takes forever, making the system unresponsive in the process' vs. 'tries hard to complete in a timely fashion, or halts with an error'. I much prefer the latter.

...

However the problem might be too little lazyness, because if I explicitly read the file line by line, memory usage remains low enough -- but ByteString is *much* faster anyway.

Right. You should still try to consume the file lazily, and make sure the data can get discarded and GC'ed when you have no further use for it. But a String is something like 8 or 12 bytes per character, a ByteString gets you down to 1. -k -- If I haven't seen further, it is by standing in the footprints of giants

Udo Stenzel

9:01 a.m.

New subject: Slow IO

Ketil Malde wrote:

...

Daniel Fischer writes:

...
Maybe I've misused the word segfault.

I think so. A segfault is the operating-system complaining about an illegal memory access. If you get them from Haskell, it is likely a bug in the compiler or run-time system (or you were using unsafeAt, or FFI).

Far simpler: This is really a segfault, and it's because of a misfeature of Linux called "memory overcommitment". When physical memory runs out, Linux happily hands out more to applications requesting it, in the vain hope that at least some of it is never accessed. Therefore, malloc() is always successful, but when the memory is finally accessed, it suddenly turns out that there isn't anything to access, which results in a segfault. No amount of error checking can prevent that and it could have hit any process allocating memory when it ran out. Sane people turn overcommitment off. Sane people wouldn't have implemented it in the first place, either. Udo. -- The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw

Bulat Ziganshin

9:30 a.m.

New subject: Re[2]: Slow IO

Hello Ketil, Wednesday, September 13, 2006, 10:41:13 AM, you wrote:

...

But a String is something like 8 or 12 bytes per character, a ByteString gets you down to 1.

12-16. Char itself, pointer to the next list element, and two boxes around them - this count for 16 bytes on 32-bit CPU. but cells with small Char are preallocated at program startup, so it will be 12 bytes for ascii-only strings but that is not the whole story :) copying GC makes program's memory usage 3 times larger, on average, than it really allocates while compacting GC has only 2x overhead ByteStrings are allocated in _unmovable_ part of GHC heap, so they don't suffer from this problem. of course, it is not free - program that creates and destroys large number of ByteStrings will suffer from memory holes, which is right the problem solved by ghc's GC so, for program that only allocates the difference may be 24/36 times on average while for create-use-destroy-loop scenarios i can't make any detailed prognoses also, traversing those lengthy lists on GCs is very time-consuming -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

6866

Age (days ago)

6868

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Bulat Ziganshin
Daniel Fischer
dons＠cse.unsw.edu.au
Jared Updike
Ketil Malde
Udo Stenzel