How to remove leading and trailing non-alpha characters, and multiple consecutive spaces?

Hi Folks, I have a string that contains a person's name. Prior to the person's name there may be some non-alpha characters. After the person's name there may be some non-alpha characters. Between the person's first name and last name there should be only one space. I want to remove the leading and trailing non-alpha characters and remove the extra spaces. Here is an example string: s = " \" John Doe \" " After processing, I should have: John Doe Below is my implementation. Is there is a shorter and more efficient implementation? ---------------------------------- import Data.Char import Data.List s = " \" John Doe \" " -- remove leading non-alpha characters t1 = dropWhile (not . isAlpha) s -- returns "John Doe \" " -- break the string up into a list of words, -- delimited by white space t2 = words t1 -- returns ["John","Doe","\""] -- create a string consisting of the first -- name, space, last name t3 = t2!!0 ++ " " ++ t2!!1 -- returns "John Doe" -- Put it all together: t4 = ((words . dropWhile (not . isAlpha)) s)!!0 ++ " " ++ ((words . dropWhile (not . isAlpha)) s)!!1

On 7 June 2013 00:36, Costello, Roger L.
Hi Folks,
I have a string that contains a person's name. [snip] Here is an example string:
s = " \" John Doe \" "
After processing, I should have:
John Doe
Below is my implementation. Is there is a shorter and more efficient implementation? [snip code]
How about this? import Data.Char import Control.Monad.Reader isAlphaOrSpace = liftM2 (||) isAlpha isSpace -- alternatively, -- isAlphaOrSpace x = isAlpha x || isSpace x -- if you find this more readable s = " \" John Doe \" " fs = unwords . words . takeWhile isAlphaOrSpace . dropWhile (not . isAlpha) $ s -- Denis Kasak

Denis' version doesn't work for names containing hyphens or apostrophes.
The original works for both.... However, the original explicitly assumes
there are always at least two names and gives erroneous data if there are
more than two. Output below shows the failure on hyphenated names.
Prelude Data.Char Control.Monad.Reader> unwords $ words $ takeWhile
isAlphaOrSpace $ dropWhile (not . isAlpha) $ " \" John Doe \" "
"John Doe"
Prelude Data.Char Control.Monad.Reader> unwords $ words $ takeWhile
isAlphaOrSpace $ dropWhile (not . isAlpha) $ " \" John Doe-Smith \"
"
"John Doe"
Tim Perry
(916) 505-3634
On Thu, Jun 6, 2013 at 4:06 PM, Denis Kasak
On 7 June 2013 00:36, Costello, Roger L.
wrote: Hi Folks,
I have a string that contains a person's name. [snip] Here is an example string:
s = " \" John Doe \" "
After processing, I should have:
John Doe
Below is my implementation. Is there is a shorter and more efficient implementation? [snip code]
How about this?
import Data.Char import Control.Monad.Reader
isAlphaOrSpace = liftM2 (||) isAlpha isSpace
-- alternatively, -- isAlphaOrSpace x = isAlpha x || isSpace x -- if you find this more readable
s = " \" John Doe \" "
fs = unwords . words . takeWhile isAlphaOrSpace . dropWhile (not . isAlpha) $ s
-- Denis Kasak
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On 7 June 2013 01:14, Tim Perry
Denis' version doesn't work for names containing hyphens or apostrophes. The original works for both.... However, the original explicitly assumes there are always at least two names and gives erroneous data if there are more than two. Output below shows the failure on hyphenated names.
Well, yes, I explicitly did not want to dwell on my (potential) implicit assumptions and just handled the cases that where visible in the problem example or were made explicit by the original poster. Adding additional special behaviour for hypens and apostrophes would be trivial, though, by further modifying the isAlphaOrSpace predicate to include the new special characters. -- Denis Kasak

On 7 June 2013 01:21, Denis Kasak
On 7 June 2013 01:14, Tim Perry
wrote: Denis' version doesn't work for names containing hyphens or apostrophes. The original works for both.... However, the original explicitly assumes there are always at least two names and gives erroneous data if there are more than two. Output below shows the failure on hyphenated names.
Well, yes, I explicitly did not want to dwell on my (potential) implicit assumptions and just handled the cases that where visible in the problem example or were made explicit by the original poster. Adding additional special behaviour for hypens and apostrophes would be trivial, though, by further modifying the isAlphaOrSpace predicate to include the new special characters.
For instance, from a ghci session:
let s = " \" John Doe-Smith \" " let (|||) = liftM2 (||) let predicate = isAlpha ||| isSpace ||| (== '-') ||| (== '\'') let fs = unwords . words . takeWhile predicate . dropWhile (not . isAlpha) $ s fs "John Doe-Smith"
-- Denis Kasak

That would certainly work. I'd be tempted to just use:
unwords . words . dropWhile (not . isAlpha) $ "unclean-first unclean-last"
Tim
On Thu, Jun 6, 2013 at 4:31 PM, Denis Kasak
On 7 June 2013 01:14, Tim Perry
wrote: Denis' version doesn't work for names containing hyphens or apostrophes. The original works for both.... However, the original explicitly assumes
On 7 June 2013 01:21, Denis Kasak
wrote: there are always at least two names and gives erroneous data if there are more than two. Output below shows the failure on hyphenated names.
Well, yes, I explicitly did not want to dwell on my (potential) implicit assumptions and just handled the cases that where visible in the problem example or were made explicit by the original poster. Adding additional special behaviour for hypens and apostrophes would be trivial, though, by further modifying the isAlphaOrSpace predicate to include the new special characters.
For instance, from a ghci session:
let s = " \" John Doe-Smith \" " let (|||) = liftM2 (||) let predicate = isAlpha ||| isSpace ||| (== '-') ||| (== '\'') let fs = unwords . words . takeWhile predicate . dropWhile (not . isAlpha) $ s fs "John Doe-Smith"
-- Denis Kasak
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Hi,
how about
import Data.Char
let input = " 'John Doe-Smith' $"
unwords.words $ reverse . dropWhile (not.isAlpha) . reverse $ dropWhile (not.isAlpha) input
Result: "John Doe-Smith"
-Michael
Am 07.06.2013 um 00:36 schrieb "Costello, Roger L."
Hi Folks,
I have a string that contains a person's name.
Prior to the person's name there may be some non-alpha characters.
After the person's name there may be some non-alpha characters.
Between the person's first name and last name there should be only one space.
I want to remove the leading and trailing non-alpha characters and remove the extra spaces.
Here is an example string:
s = " \" John Doe \" "
After processing, I should have:
John Doe
Below is my implementation. Is there is a shorter and more efficient implementation?
---------------------------------- import Data.Char import Data.List
s = " \" John Doe \" "
-- remove leading non-alpha characters
t1 = dropWhile (not . isAlpha) s -- returns "John Doe \" "
-- break the string up into a list of words, -- delimited by white space
t2 = words t1 -- returns ["John","Doe","\""]
-- create a string consisting of the first -- name, space, last name
t3 = t2!!0 ++ " " ++ t2!!1 -- returns "John Doe"
-- Put it all together:
t4 = ((words . dropWhile (not . isAlpha)) s)!!0 ++ " " ++ ((words . dropWhile (not . isAlpha)) s)!!1
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

How about
import qualified Data.Text as T
import Data.Char
let s=" \" John Doe-Smith \" "
T.unpack . T.unwords . T.words . T.dropWhileEnd (not. isAlpha) .
T.dropWhile (not . isAlpha) . T.pack $ s
"John Doe-Smith"
Cheers,
David.
2013/6/7 Michael Peternell
Hi,
how about
import Data.Char let input = " 'John Doe-Smith' $" unwords.words $ reverse . dropWhile (not.isAlpha) . reverse $ dropWhile (not.isAlpha) input Result: "John Doe-Smith"
-Michael
Am 07.06.2013 um 00:36 schrieb "Costello, Roger L."
: Hi Folks,
I have a string that contains a person's name.
Prior to the person's name there may be some non-alpha characters.
After the person's name there may be some non-alpha characters.
Between the person's first name and last name there should be only one space.
I want to remove the leading and trailing non-alpha characters and remove the extra spaces.
Here is an example string:
s = " \" John Doe \" "
After processing, I should have:
John Doe
Below is my implementation. Is there is a shorter and more efficient implementation?
---------------------------------- import Data.Char import Data.List
s = " \" John Doe \" "
-- remove leading non-alpha characters
t1 = dropWhile (not . isAlpha) s -- returns "John Doe \" "
-- break the string up into a list of words, -- delimited by white space
t2 = words t1 -- returns ["John","Doe","\""]
-- create a string consisting of the first -- name, space, last name
t3 = t2!!0 ++ " " ++ t2!!1 -- returns "John Doe"
-- Put it all together:
t4 = ((words . dropWhile (not . isAlpha)) s)!!0 ++ " " ++ ((words . dropWhile (not . isAlpha)) s)!!1
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
participants (5)
-
Costello, Roger L.
-
David Virebayre
-
Denis Kasak
-
Michael Peternell
-
Tim Perry