
From: Max Ischenko [mailto:max@ucmg.com.ua]
Well, yes. In Markdown, like in most other "rich-text" formats symbols are overloaded a lot. After all, it has to constrain itself to "plain text".
I'm going to try a "two-stage tokenization" (not sure how to name this correctly). Basically, first I'd split the raw text into "symbols" (like space, char, digit, left-bracket) and then turn these symbols into tokens (like paragraph, reference, start bold text, end bold text, etc.)
Markdown looks a lot like Wiki source to me i.e. it looks like the text source for a Wiki page. It seems to serve the same purpose i.e. well-formatted plain text intended for conversion to HTML. Many (most?) Wiki engines use straightforward regex substitution to convert the text source into HTML, rather than implement a lexer/parser/pretty-printer combination. Obviously this makes for a fairly simple implementation. Mind you, some of the regex's are quite complex... See, for example: Source for Moin-moin, which runs the Haskell wiki: http://cvs.sf.net/viewcvs.py/moin/MoinMoin/parser/wiki.py?view=markup Original c2.com wiki (actual source a bit hard to find): http://www.c2.com/cgi/wiki?TextFormattingRegularExpressions ... which leads to: http://www.c2.com/cgi/wiki?TextFormattingRegularExpressionsDiscussion http://www.c2.com/cgi/wiki?AlternativesToRegularExpressions ----------------------------------------- ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************