
Hi, I had a bad time trying to parse the words of a text. I suspect I miss some parsec knowledge. In the end it seems working, though I haven't tested much and this example contains the main features I was looking. *Main> parseTest (parseLine eof) "paolo@gmail sara,mimmo! 9ab a9b ab9 cd\n" ["paolo@gmail","sara","mimmo","cd"] --------------------- manyTillT body terminator joiner = liftM2 joiner (manyTill body (lookAhead terminator)) terminator wordChar = letter <|> oneOf "_@" <?> "a valid word character" nonSeparator = wordChar <|> digit wordEnd = do x <- wordChar notFollowedBy nonSeparator return x word = manyTillT wordChar (try wordEnd) (\b t -> b ++ [t]) <?> "a word" wordStart = do (try nonSeparator >> unexpected "non separator") <|> anyChar lookAhead wordChar nextWord = manyTill anyChar (try wordStart) >> (try word <|> nextWord) parseLine end = do f <- option [] $ return `fmap` try word r <- many $ try nextWord manyTill anyChar end return (f ++ r) ----------- Any comment to simplify this code is welcome. Paolino.

On Fri, Mar 30, 2007 at 05:43:34AM +0200, paolino wrote:
Hi, I had a bad time trying to parse the words of a text. I suspect I miss some parsec knowledge.
I'd start by not sextuple-posting, it just sextuples the ugliness ;-) Anyway, I think parsec is *far* too big a hammer for the nail you're trying to hit. import Char( isAlpha ) import List( groupBy ) equating f x y = f x == f y -- in Data.Eq, iff you have GHC 6.7 isLetter x = isAlpha x || x == '_' || x == '@' myWords = filter (isLetter . head) . groupBy (equating isLetter) Stefan

On 3/30/07, Stefan O'Rear
On Fri, Mar 30, 2007 at 05:43:34AM +0200, paolino wrote:
Hi, I had a bad time trying to parse the words of a text. I suspect I miss some parsec knowledge.
I'd start by not sextuple-posting, it just sextuples the ugliness ;-) Mhh, still I don't see any them in my inbox mails , probably something buggy in gmail configuration, sorry :/.
import Char( isAlpha ) import List( groupBy )
equating f x y = f x == f y -- in Data.Eq, iff you have GHC 6.7
isLetter x = isAlpha x || x == '_' || x == '@'
myWords = filter (isLetter . head) . groupBy (equating isLetter)
Testing your code, it misses the words with numbers inside exclusion and uses the number as separators. !runhaskell prova.hs ["paolo@gmail","sara","mimmo","ab","a","b","ab","cd"] Thanks

From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Paolino
I'd start by not sextuple-posting, it just sextuples the
ugliness ;-) Mhh, still I don't see any them in my inbox mails , probably something buggy in gmail configuration, sorry :/.
Are you expecting to see your sent message eventually arrive in your inbox? gmail doesn't do that by default (and I don't see an obvious setting to change it). gmail seems to be fairly keen on removing duplicate messages. Alistair ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. *****************************************************************

On Friday 30 March 2007 11:44, Bayley, Alistair wrote:
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Paolino
I'd start by not sextuple-posting, it just sextuples the
ugliness ;-) Mhh, still I don't see any them in my inbox mails , probably something buggy in gmail configuration, sorry :/.
Are you expecting to see your sent message eventually arrive in your inbox? gmail doesn't do that by default (and I don't see an obvious setting to change it). gmail seems to be fairly keen on removing duplicate messages.
Then probably I was expecting haskell-cafe to send messages sent by me to me also so I can have my messages in the thread automatically. Or , I have to copy them from the sent box to the haskell-cafe filter box ? I'm confused. Paolino

On 30/03/07, Bayley, Alistair
Mhh, still I don't see any them in my inbox mails , probably something buggy in gmail configuration, sorry :/.
Are you expecting to see your sent message eventually arrive in your inbox? gmail doesn't do that by default (and I don't see an obvious setting to change it). gmail seems to be fairly keen on removing duplicate messages.
The ML software has the option of not sending you your own messages, but I can't remember if it is on by default or not. Cheers, Dougal.

On Friday 30 March 2007 11:54, Dougal Stanton wrote:
On 30/03/07, Bayley, Alistair
wrote: Mhh, still I don't see any them in my inbox mails , probably something buggy in gmail configuration, sorry :/.
Are you expecting to see your sent message eventually arrive in your inbox? gmail doesn't do that by default (and I don't see an obvious setting to change it). gmail seems to be fairly keen on removing duplicate messages.
The ML software has the option of not sending you your own messages, but I can't remember if it is on by default or not.
Just controlled , it's set to Yes on receiving copy of my posts. Thanks anyway. Paolino

On Friday 30 March 2007 06:59, Stefan O'Rear wrote:
Anyway, I think parsec is *far* too big a hammer for the nail you're trying to hit.
In the end , the big hammer solution has become parseLine = fmap (map fst. filter snd) $ many parser where parser = do w <- option ("",False) parseAWord anyChar -- skip the separator return w parseAWord = try positive <|> (many1 nonSeparator >> return ("",False)) positive = do c <- wordChar (cs,tn) <- option ("",True) parseAWord return (c:cs,tn) wordChar = letter <|> oneOf "_@" <?> "a word-character" nonSeparator = wordChar <|> digit <?> "a non-separator" while your, corrected not parsec solution is wordsOfLine isNonSeparator isWordChar = (filter (all isWordChar)). groupBy (\x y -> (isNonSeparator x) == (isNonSeparator y)) Still ,I wonder if the parsec solution can be simplified. Thanks. (PS. I put an option on the ML software which sends me an ack on posting , so at least I know I sent the mail :) )
participants (6)
-
Bayley, Alistair
-
Dougal Stanton
-
paolino
-
Paolino
-
Paolo Veronelli
-
Stefan O'Rear