
Hi Aaron,
On Wed, Sep 17, 2008 at 23:20, Aaron Denney
I entered the discussion as which model is a workaround for the other -- someone said processes were a workaround for the lack of good threading in e.g. standard CPython. I replied that most languages thread support can be seen as a workaround for the poor performance of communicating processes. (creation in particular is usually cited, but that cost can often be reduced by process pools, context switching costs, alas, is harder.)
That someone was probably me, but this is not what I meant. I meant that the "processing" [1] Python module is a workaround for CPython's performance problems with threads. For those who don't know it, the processing module exposes a nearly identical interface to the standard threading module in Python, but runs each "thread" in a seperate OS process. The processing module emulates shared memory between these "threads" as well as locking primitives and blocking. That is what I meant when I said "processing" (the module) was a workaround for CPython's threading issues. [1] http://www.python.org/dev/peps/pep-0371/ The processes vs. threads depends on definitions. There seem to be two sets floating around here. One is that processes and threads are essentially the same, the only difference being that processes don't share memory but threads do. With this view it doesn't really matter if "processes" are implemented as proper OS processes or OS threads. Discussion based on this definition can be interesting and one model fits some problems better than the other and vice versa. The other one is the systems view of OS processes vs. OS threads. Discussion about the difference between these two is only mildly interesting imo, as I think most people agree on things here and they are well covered in textbooks that are old as dirt.
The central aspect in my mind is a default share-everything, or default share-nothing.
[..snip...] These are, in fact, process models. They are implemented on top of thread models, but that's a performance hack. And while putting this model on top restores much of the programming sanity, in languages with mutable variables and references that can be passed, you still need a fair bit of discipline to keep that sanity. There, the implementation detail of thread, rather than process allows and even encourages shortcuts that violate the process model.
Well, this is a viewpoint I don't totally agree with. Correct me if I'm not understanding you, but you seem to be making the point that OS processes are often preferred because with threads, you *can* get yourself in trouble by using shared memory. The thing I don't agree with is "let's use A because B has dangerous features". This is sort of like the design mantra of languages like Java. Now, you may say that indeed Java has been wildly successful, but I think (or hope) that is because we don't give people (programmers) enough credit. Literature, culture and training in the current practice of programming could do well IMO with making fewer _good_ programmers rather than a lot of mediocre ones. And _good_ programmers don't need to be handcuffed just because otherwise they *could* poke themselves in the eye. I.e. if you need to sacrifice the efficiency of threads for full-blown OS processes because people can't stay away from shared memory, then something is fundamentally wrong. I'll stop here, this is starting to sound like a very OT rant. cheers, Arnar