
On Thu, Jul 15, 2010 at 3:19 AM, prad
which works fine for single lines, but produces nothing for multiple lines - same with some of the other ways i tried it with single lines good, nothing for multiple. python requires setting the re.S flag which i always found strange since \n i thought is a char as well.
The problem is classic in regex world : by default "." match any character except \n, I would suggest "<title>\n([^<]*)\n</title>" which is probably a bit more robust anyway. Though you must be aware that parsing html (or any markup language) properly with regexp is just impossible in general and you can only get crude and fragile approximations. There are proper html parsing libraries on hackage if your needs become too complex for simple regexp to handle. -- Jedaï