Practise fingerspelling with Haskell! (Code cleanup request)

The following is a slap-dash program for generating a list of pairs of words which differ by, at most, one letter. It's quite verbose at the moment, because (a) that was the way I wrote it, a snippet at a time, and (b) I lack the wit to make it shorter. Can anyone recommend ways to make this program more efficient/neat/elegant? It runs in decent time on my machine, but it's not exceedingly pretty and I'm sure it can be made shorter too. (The full thing can be found at http://193.219.108.225/code/fingerspell/ if you want to pull in a working version to play with. The purpose of the program is for a game to practise fingerspelling when learning sign language. More on that here: http://brokenhut.livejournal.com/265471.html.) Cheers, D. ---- edited highlights below ---- -- Number of letters difference between two words. difference :: Pair -> Int difference = length . filter (==False) . uncurry (zipWith (==)) -- Keep only pairs that differ by at most -- one letter difference. keepOneDiff :: PairSet -> PairSet keepOneDiff = map snd . filter (\x -> (fst x) < 2) . map (difference &&& id) -- Pairs of words of equal length, sorted to reduce -- duplicates of (a,b), (b,a) type. They shouldn't -- be completely eradicated because part of the game -- is to spot when they;re the same word. listPairs :: WordSet -> PairSet listPairs ws = [ (w, w') | w <- ws, w' <- ws, length w == length w', w <= w' ] -- Take N pairs of words which are the same -- length and differ by at most one letter. wordpairs :: Int -> WordSet -> PairSet wordpairs n = take n . keepOneDiff . listPairs fingerspell wl p = do wordfile <- words `liftM` readFile "/usr/share/dict/words" mapM_ pretty $ wordpairs p $ filter (requirements) wordfile -- Make sure all the words are of the required length and are -- just made up of letters, not punctuation. where requirements w = length w == wl && all (isAlpha) w pretty (x,y) = putStrLn $ x ++ ", " ++ y

Dougal Stanton wrote:
The following is a slap-dash program for generating a list of pairs of words which differ by, at most, one letter. It's quite verbose at the moment, because (a) that was the way I wrote it, a snippet at a time, and (b) I lack the wit to make it shorter.
Can anyone recommend ways to make this program more efficient/neat/elegant? It runs in decent time on my machine, but it's not exceedingly pretty and I'm sure it can be made shorter too.
I like it for its elegant point-free style :)
-- Number of letters difference between two words. difference :: Pair -> Int difference = length . filter (==False) . uncurry (zipWith (==))
Apparently, difference can only detect character replacements but not character insertion or deletion, but that's probably not your use case.
-- Pairs of words of equal length, sorted to reduce -- duplicates of (a,b), (b,a) type. They shouldn't -- be completely eradicated because part of the game -- is to spot when they;re the same word. listPairs :: WordSet -> PairSet listPairs ws = [ (w, w') | w <- ws, w' <- ws, length w == length w', w <= w' ]
You can avoid generating the superfluous half of the pairs by using tails listPairs ws = [ (head ws', w') | ws' <- tails ws, w' <- ws' , let w = head ws', length w == length w'] Of course, grouping words by length first and pairing the resulting groups is more efficient than filtering out all the pairs where length w /= length w'. But you restrict fingerspell to a fixed word length anyway, so it doesn't matter. Regards, apfelmus

On 18/07/07, apfelmus
I like it for its elegant point-free style :)
Yes, well, I am rather enamoured of them! :-)
Apparently, difference can only detect character replacements but not character insertion or deletion, but that's probably not your use case.
Yes, that is the case. If I allowed words differing in length they would necessarily look different, so it would be less of a challenge. I could still challenge people to identify the two words of course. Any practice is good.
You can avoid generating the superfluous half of the pairs by using tails
listPairs ws = [ (head ws', w') | ws' <- tails ws, w' <- ws' , let w = head ws', length w == length w']
Of course, grouping words by length first and pairing the resulting groups is more efficient than filtering out all the pairs where length w /= length w'. But you restrict fingerspell to a fixed word length anyway, so it doesn't matter.
I realised after I sent that post that I had *aready* filtered the words so they were all the same length. So the length condition in that list comprehension was completely superfluous. Meh. I will look at using tails to clean things up a bit. I tried to see if there were redundant parts I could remove today, but I was stymied by my lack of understanding of the list comprehensions. I worked out that [ (a,b) | a <- as, b <- bs ] must be equivalent to
comp = concatMap (\x -> map ((,) x) ys) xs
but I can't really say how conditions like "a /= b" get slotted in to that style. Is there a reference for that? Cheers, Dougal.

On 7/18/07, Dougal Stanton
I worked out that [ (a,b) | a <- as, b <- bs ] must be equivalent to
comp = concatMap (\x -> map ((,) x) ys) xs
but I can't really say how conditions like "a /= b" get slotted in to that style. Is there a reference for that?
As I understand it, list comprehensions are equivalent to monadic expressions in the [] monad. The only trick is that conditions in the list comprehension have to be translated into guard expressions. For instance,
[(x,y) | x <- xs, y <- ys, x /= y]
translates into:
do x <- xs y <- ys guard (x /= y) return (x,y)
You're partway there - concatMap is flip (>>=), so you have the xs >>= (\x -> <stuff>) part. /g -- The man who'd introduced them didn't much like either of them, though he acted as if he did, anxious as he was to preserve good relations at all times. One never knew, after all, now did one now did one now did one.

On 18/07/07, J. Garrett Morris
You're partway there - concatMap is flip (>>=), so you have the xs >>= (\x -> <stuff>) part.
Ah, yes! I read about this equivalence in one of the other threads today but it didn't make any connection. Doh! I think I will have to, sooner or later, become more versed in the subtle ways of non-IO monads. They seem to be capable of some seriously tricksy shenanigans. In other news, I worked out a few minutes ago while walking home that the whole keepOneDiff function (used in my program above) is incredibly convoluted, to the point of Heath Robinson contortions. To be clear, I had:
keepOneDiff = map snd . filter (\x -> (fst x) < 2) . map (difference &&& id)
Where I could have just done this...
keepOneDiff = filter ((< 2) . difference)
Sometimes I am astounded at my own lack of vision. :-O Clearly, today has not been a good day. Cheers for all your help folks, Dougal.

Dougal Stanton wrote:
I think I will have to, sooner or later, become more versed in the subtle ways of non-IO monads. They seem to be capable of some seriously tricksy shenanigans. Keep trying. At some point you will achieve enlightenment in a blinding flash of light. Then you will write a monad tutorial. Everybody learning Haskell goes through this.
In the hope of helping, here is *my* brief tutorial. Monads capture the pattern of "do X then Y in context C". All other programming languages have a single fixed context built in, and almost all of them use the same one. This context is the reason that the order of X and Y matters: side effects are recorded in C, so stuff that happens in X is going to affect Y. Pure functions have no context, which means that X cannot affect Y (no context to transmit information). Which is great unless you actually want to say "do X and then do Y". In order to do that you need to define a context for the side effects. Like I said earlier, most languages have exactly one context built in with no way to change it. Haskell has that context too: its called IO (as you may have guessed, these contexts are called "monads" in Haskell). The thing about Haskell is that you can substitute other contexts instead. Thats whats so great about monads. But since you have only ever programmed in languages that have the IO monad built in, the idea of replacing it with something else seems very strange. Ever programmed in Prolog? The backtracking that Prolog does is a different way of propogating side effects from one step to the next. Prolog provides a different context than most languages, and in Haskell you would describe it using a different monad. However because Prolog doesn't have monads it wasn't able to handle IO properly. Hence a backtracking computation with side effects in Prolog will repeat each side effect every time it gets reached. You could actually describe this in Haskell by defining a new monad with a backtracking rule but based on IO. Or you could have a different monad that only executes the side effects once the computation has succeeded. Each monad has two functions; "return" and ">>=" (known as "bind"). "return" says what it means to introduce some value into the context. Its really just there for the types, and in most cases its pretty boring. ">>=" describes how side effects propagate from one step to the next, so its the interesting one. The final (and really cool) thing you can do is glue a new monad together using "monad transformers" as a kit of parts. A monad transformer takes an "inner" monad as a type argument and the bind operation describes how effects in the inner monad are propagated in the outer monad. Thats a bit more advanced, but when you come to create your own monads its generally easier than building from scratch. Now I suggest going to http://en.wikibooks.org/wiki/Haskell/Understanding_monads and see if it makes any more sense. Ignore the nuclear waste metaphor if it doesn't work for you. The State monad is described about half way down, so you might think about that. State takes a type argument for the state variable, so side effects are restricted to changes in that variable. Hence "State Integer" is a monad with a state consisting of one integer. You might like to consider what could be done with random values in "State StdGen". Hope this helps. Paul.

Dougal Stanton wrote:
I worked out that [ (a,b) | a <- as, b <- bs ] must be equivalent to
comp = concatMap (\x -> map ((,) x) ys) xs
but I can't really say how conditions like "a /= b" get slotted in to that style. Is there a reference for that?
Here's an example translation [ (a,b) | a <- as, b <- bs, a /= b ] = concatMap (\a -> [ (a,b) | b <- bs, a /= b ]) as = concatMap (\a -> concatMap (\b -> [ (a,b) | a /= b ]) bs) as = concatMap (\a -> concatMap (\b -> if a /= b then [(a,b)] else []) bs) as The exact specification can be found at http://www.haskell.org/onlinereport/exps.html#list-comprehensions Of course, this is not very different from monadic expressions in the []-monad. Regards, apfelmus

Just some remarks:
difference = length . filter (==False) . uncurry (zipWith (==))
Maybe difference = length . filter id . uncurry (zipWith (/=)) or even difference w = length . filter id . zipWith (/=) w and defer the call of uncurry. We then have keepOneDiff = filter ((< 2) . uncurry difference) [Clearly the last version has a point...] But once having done this, we see, that we construct the pair in listPairs and deconstruct it in keepOneDiff. While this is perfectly valid (and good practice to separate the concerns), in the concrete setting one could write wanted ws = [ (w,v) | (w:vs) <- tails ws, v <- vs, difference w v < 2 ] and thus only create the pairs when necessary. Moreover, we only calculate the difference to check whether it is smaller than 2. We can do this directly (capturing the common pattern): diff_lt_2 = diff (diff (const . const $ False)) diff _ [] [] = True diff f (x:xs) (y:ys) = if x == y then diff f xs ys else f xs ys diff _ _ _ = error "length xs /= length ys!?" We now have wanted ws = [ (w,v) | (w:vs) <- tails ws, v <- vs, diff_lt_2 w v ] This modification boosts performance a lot, for example fingerspell 15 100 runs in 5.5 instead of 53 secs on my machine. A last remark:
requirements w = length w == wl && all (isAlpha) w
traverses w twice (although length does'nt touch the elements). One could do this in one pass along the lines req [] [] = True req (_:ys) (x:xs) | isAlpha x = req ys xs req _ _ = False requirements = req (replicate wl undefined) Unfortunately this is slower, I guess it has something to do with the use of the strict foldl' for length and with the fact that in /usr/shar/dict/words there seems to be only words for which all isAlpha returns True. A very last sentence: The req given above is just an instance of an even more general diff, so I finally reached the following: import Data.List (tails) import Data.Char (isAlpha) import Control.Monad (liftM) fingerspell wl p = let okay = diff_eq_0 (replicate wl True) . map isAlpha -- | faster in the concrete setting but not in general!? -- let okay w = length w == wl && all isAlpha w in liftM words (readFile "/usr/share/dict/words") >>= mapM_ putStrLn . take p . wanted . filter okay -- difference w = length . filter id . zipWith (/=) w -- diff_le_1 w v == difference w v < 2 wanted ws = [ w++", "++v | (w:vs) <- tails ws, v <- vs, diff_le_1 w v ] where diff_le_1 = diff (error "length w /= length v") diff_eq_0 diff g f (x:xs) (y:ys) | x == y = diff g f xs ys | otherwise = f xs ys diff g _ xs ys = g xs ys diff_eq_0 :: Eq a => [a] -> [a] -> Bool diff_eq_0 = diff (\ u v -> null u && null v) (const . const $ False) /BR -- -- Mirko Rahn -- Tel +49-721 608 7504 -- --- http://liinwww.ira.uka.de/~rahn/ ---

Mirko Rahn wrote:
wanted ws = [ (w,v) | (w:vs) <- tails ws, v <- vs, difference w v < 2 ]
Moreover, we only calculate the difference to check whether it is smaller than 2. We can do this directly (capturing the common pattern):
diff_lt_2 = diff (diff (const . const $ False))
diff _ [] [] = True diff f (x:xs) (y:ys) = if x == y then diff f xs ys else f xs ys diff _ _ _ = error "length xs /= length ys!?"
We now have
wanted ws = [ (w,v) | (w:vs) <- tails ws, v <- vs, diff_lt_2 w v ]
This modification boosts performance a lot, for example fingerspell 15 100 runs in 5.5 instead of 53 secs on my machine.
Note that using Peano-numbers can achieve the same effect of stopping the length calculation as soon as more than one character is different. data Nat = Zero | Succ Nat deriving (Eq, Ord) instance Num Nat where (Succ x) + y = Succ (x+y) Zero + y = y fromInteger 0 = Zero fromInteger n = Succ $ fromInteger (n-1) difference :: Pair -> Nat difference = Data.List.genericLength . filter not . uncurry (zipWith (==)) wanted ws = [ (w,v) | (w:vs) <- tails ws, v <- vs, difference w v <= 1 ] Regards, apfelmus

apfelmus wrote:
Note that using Peano-numbers can achieve the same effect of stopping the length calculation as soon as more than one character is different.
data Nat = Zero | Succ Nat deriving (Eq, Ord)
instance Num Nat where (Succ x) + y = Succ (x+y) Zero + y = y
Very nice (and quite fast), thanks for sharing this. One point: Writing down the equations for (+) by looking at the left argument first, you introduce an additional constraint to the user program, since if we have lenL [] = 0 lenL (x:xs) = 1 + lenL xs lenR [] = 0 lenR (x:xs) = lenR xs + 1 then *FingerSpell> (lenL (repeat ()) :: Nat) < 10 False *FingerSpell> (lenR (repeat ()) :: Nat) < 10 *** Exception: stack overflow So you can change a program that returns a proper value to one that loops by replacing lenL with lenR. Such problems are very difficult to track down, even if the library documentation states it very clearly. /BR -- -- Mirko Rahn -- Tel +49-721 608 7504 -- --- http://liinwww.ira.uka.de/~rahn/ ---

Mirko Rahn wrote:
apfelmus wrote:
Note that using Peano-numbers can achieve the same effect of stopping the length calculation as soon as more than one character is different.
data Nat = Zero | Succ Nat deriving (Eq, Ord)
instance Num Nat where (Succ x) + y = Succ (x+y) Zero + y = y
Very nice (and quite fast), thanks for sharing this.
One point: Writing down the equations for (+) by looking at the left argument first, you introduce an additional constraint to the user program, since if we have
lenL [] = 0 lenL (x:xs) = 1 + lenL xs
lenR [] = 0 lenR (x:xs) = lenR xs + 1
then
*FingerSpell> (lenL (repeat ()) :: Nat) < 10 False *FingerSpell> (lenR (repeat ()) :: Nat) < 10 *** Exception: stack overflow
So you can change a program that returns a proper value to one that loops by replacing lenL with lenR. Such problems are very difficult to track down, even if the library documentation states it very clearly.
It's the same with (||) or (&&): any p = foldr (||) False . map p any' p = foldr (flip (||)) False . map p Here, any' id will choke on x = True : repeat False but any id works just fine. Since there's no way to have a function be lazy in both arguments, the implicit convention is to make functions strict in the first arguments and, if applicable, lazy in the last arguments. In other words, the convention is True || _|_ = True but not _|_ || True = True 1 + _|_ = Succ _|_ but not _|_ + 1 = Succ _|_ Regards, apfelmus
participants (5)
-
apfelmus
-
Dougal Stanton
-
J. Garrett Morris
-
Mirko Rahn
-
Paul Johnson