striping non-alphanumericals

I have the following code snippet: import System.IO import Data.String.Utils main = withFile "test.txt" ReadMode $ \handle -> do xs <- getwords handle sequence_ $ map putStrLn (escapeRe xs) getwords :: Handle -> IO [String] getwords h = hGetContents h >>= return . words What I want to to there is to get i.e. "word," or "word!" etc. and arrive at "word". I understand that escapeRe may do this. However, I always get some sort of mismatch errors like this: test.hs:6:38: Couldn't match type `Char' with `[Char]' Expected type: [String] Actual type: String In the return type of a call of `escapeRe' In the second argument of `map', namely `(escapeRe xs)' In the second argument of `($)', namely `map putStrLn (escapeRe xs)' test.hs:6:47: Couldn't match type `[Char]' with `Char' Expected type: String Actual type: [String] In the first argument of `escapeRe', namely `xs' In the second argument of `map', namely `(escapeRe xs)' In the second argument of `($)', namely `map putStrLn (escapeRe xs)' Now I have three questions: 1. Is escapeRe the right function to use here? 2. What do I do wrong? 3. I read in the Real World Haskell book that actually all these file/string operations are very very slow. The recommendation is to work with bytestrings instead. Is there any (fast) way to strip non-alphanumericals from bytestrings? Thanks, --Joerg

On Wed, Dec 11, 2013 at 1:37 PM, Joerg Fritsch
I have the following code snippet:
import System.IO
import Data.String.Utils
main = withFile "test.txt" ReadMode $ \handle -> do
xs <- getwords handle
sequence_ $ map putStrLn (escapeRe xs)
getwords :: Handle -> IO [String]
getwords h = hGetContents h >>= return . words
What I want to to there is to get i.e. “word,” or “word!” etc. and arrive at “word”. I understand that escapeRe may do this. However, I always get some sort of mismatch errors like this:
test.hs:6:38:
Couldn't match type `Char' with `[Char]'
Expected type: [String]
Actual type: String
In the return type of a call of `escapeRe'
In the second argument of `map', namely `(escapeRe xs)'
In the second argument of `($)', namely
`map putStrLn (escapeRe xs)'
test.hs:6:47:
Couldn't match type `[Char]' with `Char'
Expected type: String
Actual type: [String]
In the first argument of `escapeRe', namely `xs'
In the second argument of `map', namely `(escapeRe xs)'
In the second argument of `($)', namely
`map putStrLn (escapeRe xs)'
Now I have three questions:
1. Is escapeRe the right function to use here?
`escapeRe` is not the correct function to use. That is the function you would use if you were trying to create a regular expression to match the given input, but this is not at all what you are doing.
2. What do I do wrong?
Well, the type is wrong because you did `sequence_ $ map putStrLn (escapeRe xs)` instead of `sequence_ $ map (putStrLn . escapeRe) xs`. Note that `sequence_ $ map f xs` can be written as `mapM_ f xs` which is much shorter and more clear. This is what I would write: mapM_ (putStrLn . escapeRe) xs That said, `escapeRe` is not at all useful for what you are trying to do. You should probably use `filter` and `isAlphaNum` from Data.Char. 3. I read in the Real World Haskell book that actually all these
file/string operations are very very slow. The recommendation is to work with bytestrings instead. Is there any (fast) way to strip non-alphanumericals from bytestrings?
This is true. You should use Text or ByteString for performance. Text is probably more appropriate for your use case. You can efficiently solve this exercise with functionality from Data.Char, Data.Text, and Data.Text.IO. Note that this sort of question might be more appropriate for haskell-beginners: http://www.haskell.org/mailman/listinfo/beginners -bob

What I want to to there is to get i.e. "word," or "word!" etc. and arrive at "word". I understand that escapeRe may do this.
Why not do it the easy way? import Char(isAlpha) keep_letters = filter isAlphs Then keep_letters "word," = keep_letters "(w)ord" = "word". If you are content to work with Unicode, you get use getLine :: IO ByteString and splitWith :: (Word8 -> Bool) -> ByteString -> [ByteString] with predicate not . isAlpha . chr . fromIntegral and then filter out the empty ByteStrings
participants (3)
-
Bob Ippolito
-
Joerg Fritsch
-
ok@cs.otago.ac.nz