[OT] A nice organized collection of threads in Haskell-Cafe

Hi, I am working on a product to analyze posts made in Forums, Usenet and discussion mailing lists like Haskell-Cafe. For this, I require the messages to be accessible in this format: <forum> (* example: Haskell-cafe *) [ list of - <thread> [ list of - <post> </post> ] </thread> ] </forum> as XML. However, I find that that the messages (in haskell-cafe/usenet) themselves aren't organized in this fashion. I would like to know if there is any way in which I can get the archives in this fashion. Thanks, -- ~Vimal RLE :) encode = map (length &&& head) . group decode = concatMap (uncurry replicate)

Vimal wrote:
I am working on a product to analyze posts made in Forums, Usenet and discussion mailing lists like Haskell-Cafe. For this, I require the messages to be accessible in this format:
<forum> (* example: Haskell-cafe *) [ list of - <thread> [ list of - <post> </post> ] </thread> ] </forum>
Research into the "Message-ID:" "In-Reply-To:" "References:" headers. They give complete information. In short, they give a pointer tree, child pointing to parent or ancestors. (Corollary: A thread is a tree of posts, not a flat list of posts. The most brain-damaging effect of using a web forum is assuming a thread is a flat list of posts.) Some reply posts lack "In-Reply-To:" "References:" headers because their authors fail to choose compliant software or know the issue. Some non-reply posts (genuinely new topic, not even digression from existing ones) contain "In-Reply-To:" "References:" headers because their authors fail to know the issue and just hit "reply" to write new posts. All these are because the "everyone can haz PC" movement failed to educate everyone. You can cope by looking at "Subject:".

Albert Y. C. Lai wrote:
Some reply posts lack "In-Reply-To:" "References:" headers because their authors fail to choose compliant software or know the issue. Some non-reply posts (genuinely new topic, not even digression from existing ones) contain "In-Reply-To:" "References:" headers because their authors fail to know the issue and just hit "reply" to write new posts. All these are because the "everyone can haz PC" movement failed to educate everyone. You can cope by looking at "Subject:".
Thunderbird has a long-standing bug in that new posts having the same subject line as some other post that happened many years ago get added to that thread. It's really most irritating. :-S

Andrew Coppin wrote:
Thunderbird has a long-standing bug in that new posts having the same subject line as some other post that happened many years ago get added to that thread. It's really most irritating. :-S
I have investigated. A bit of skepticism goes a long way. Never be taken in. So, for the record: In Thunderbird if you click "Write" (not "Reply" or "Reply All"), the headers are according to the semantics of "Write", i.e., no "References:" or "In-Reply-To:". Insofar as headers, this is correct behaviour. When Thunderbird gets a post (from your "Write" or from outside) with no "References:" and "In-Reply-To:" header, but with "Subject:" same as existing posts, it still displays them together as a thread. But this is just a display trick - "References:" and "In-Reply-To:" are not fudged. Evidently, this is a measure against non-compliant posts. Furthermore, this is configurable. In the config editor, look for "mail.strict_threading". The presence of the setting implies that the programmers know what they are getting into. There is a tension between following the rules and inter-operating with those who don't follow the rules. This is not a bug; this is a conscious compromise. And you can change it. Changing the setting doesn't change the threading structure of existing posts - the decisions made back then were recorded. (There is also a way to delete that, along with lots of other meta-data: delete the appropriate .msf file.) The setting is effective for posts seen henceforth. I can't blame you for being not observant. Afterall, this is precisely what I'm alluding to with "everyone can haz PC", or rather, the way Bill Gates executes it. Everyone becomes superficial, everyone just looks at what's displayed on the screen - or rather, fictionized on the screen - and jumps to conclusions. Never be taken in.

Albert Y. C. Lai wrote:
Andrew Coppin wrote:
Thunderbird has a long-standing bug in that new posts having the same subject line as some other post that happened many years ago get added to that thread. It's really most irritating. :-S
I have investigated. A bit of skepticism goes a long way. Never be taken in. So, for the record:
In Thunderbird if you click "Write" (not "Reply" or "Reply All"), the headers are according to the semantics of "Write", i.e., no "References:" or "In-Reply-To:". Insofar as headers, this is correct behaviour.
When Thunderbird gets a post (from your "Write" or from outside) with no "References:" and "In-Reply-To:" header, but with "Subject:" same as existing posts, it still displays them together as a thread. But this is just a display trick - "References:" and "In-Reply-To:" are not fudged. Evidently, this is a measure against non-compliant posts. Furthermore, this is configurable. In the config editor, look for "mail.strict_threading".
The presence of the setting implies that the programmers know what they are getting into. There is a tension between following the rules and inter-operating with those who don't follow the rules. This is not a bug; this is a conscious compromise. And you can change it.
Changing the setting doesn't change the threading structure of existing posts - the decisions made back then were recorded. (There is also a way to delete that, along with lots of other meta-data: delete the appropriate .msf file.) The setting is effective for posts seen henceforth.
I can't blame you for being not observant. Afterall, this is precisely what I'm alluding to with "everyone can haz PC", or rather, the way Bill Gates executes it. Everyone becomes superficial, everyone just looks at what's displayed on the screen - or rather, fictionized on the screen - and jumps to conclusions.
Never be taken in.
I have heard - multiple times - that this erroneous behaviour can be turned off. I have tried endlessly to follow such instructions to the letter. And yet, I can never get Thunderbird to not misthread things. So kindly don't tell me I'm jumping to conclusions. I've read the bug reports (there have been many!) and followed the instructions for changing the settings, and it never ever works. I still get broken threading.

Hi, Yes, I looked into it as per the Mailman documentation. I was wondering if there was a module already that could do it, to avoid some work :) What is the difference between In-Reply-To and References? And the list of posts was just the beginning. Each post would have sufficient information to reconstruct the tree... And looks like this post has gone on a tangent :D Vimal

On Dec 10, 2007, at 0:16 , Vimal wrote:
What is the difference between In-Reply-To and References?
In-Reply-To: specifies the immediate parent message in the tree; References: specifies a (possibly truncated) path back to the tree's root. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Hi, Thanks for the info.
Vimal wrote:
What is the difference between In-Reply-To and References?
There was a time In-Reply-To was for emails and References was for Usenet.
My friend wrote a parser for Haskell-cafe messages from the mailman
archives as suggested.
He told that there were a lot of messages that he had to reject
because they didnt have a valid In-Reply-To header. i.e., the
In-Reply-To header referred to some message that wasnt in the list of
messages!
Perhaps it was from another month's message!
Thanks,
Vimal
On 11/12/2007, Albert Y. C. Lai

Vimal
Vimal wrote:
What is the difference between In-Reply-To and References?
There was a time In-Reply-To was for emails and References was for Usenet.
My friend wrote a parser for Haskell-cafe messages from the mailman archives as suggested.
One place to look for example threading code is in the Gnus news/mail client for Emacs. Works fairly well, and is (was, when I looked at it briefly ages ago) not too complicated, and in elisp, which is not quite entirely an unfunctional language. -k -- If I haven't seen further, it is by standing in the footprints of giants
participants (6)
-
Albert Y. C. Lai
-
Andrew Coppin
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Ketil Malde
-
Vimal