
Hello Andrew, Saturday, April 19, 2008, 7:50:30 PM, you wrote:
this looks like disk caching effects. if data are read from disj on first run and from disk cache on the next runs, this only means that your algorithm works faster than reading its data from disk
Negative. No data is ever *read* from disk, only *written* to disk. (And each test writes to a different file.)
The data to be sorted is generated using a trivial LCG PRNG.
if you don't generate new data for each sorting run, this means that data generation is much slower than sorting. don't forget about ghc laziness :)
there are plenty of reasons: first, -threaded make i/o overlapped with calculations.
Not with -N1.
are you sure? :) afaik, -threaded RTS uses dedicated i/o thread despite of -N setting (which controls only amount of threads running *user* code)
second, parallel version may exhibit better cpu cache behavior - such as processing all data in cache before sending it back to memory
Again, with -N1, it is *still* only using 1 CPU core.
parallel version is different from sequential one and it process data in another order. for example, imagine tar+gzip algorithm which runs sequentially and write intermediate results to the disk. the same algorithm, being multithreaded, will compress data on the fly and don't store intermediate data to the HDD despite using only one core. the same effect applies to cpu cache usage
Weird thing #4: Adding "-N2" makes *everything* slow down a few percent. In particular, Task Manager shows only one CPU core in use.
it's easy - your algorithm isn't really parallel.
Fails to explain why the parallel version is faster than the sequential one (even with no parallelism), or why the sequential algorithm should slow down with more threads. (Surely the extra threads just sit idle?)
there are management overheads. with multiple worker threads you have many OS threads which fights for the right to execute single Haskell thread :))
there are many subtle effects making optimization much more interesting than using simple schemas ;) it's why i like it so much :))
Well, based on the results I've seen so far, it seems that parallelism is a complete waste of time because it doesn't gain you anything. And that doesn't make a lot of sense...
i made world fastest compression program using multithreading, so the problem is definitely on other side ;) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com