Software Tools in Haskell

On 2007.12.10 13:52:41 -0600, Tommy McGuire
In the "if anyone is interested,..." department....
For reasons that remain unclear, early this fall I started translating Brian W. Kernighan and P.J. Plaugher's classic _Software Tools in Pascal_ into Haskell. I have completed most of it, up to the second part of chapter 8 which presents a proto-m4 preprocessor. I have the code online including notes, comments, descriptions, and a few alternate approaches.
Attractions include:
* A fair gamut of the usual Unix suspects: proto-cat, proto-wc, proto-tr, proto-compress, proto-ar, proto-grep, etc.
* A usable editor, if you consider a de-featured ed-alike to be usable.
* A simple monadic regular expression engine.
* Zippers, Parsec, the State monad, the StateT monad transformer, and other attempts to sully Computing Science's brightest jewels.
* Lots and lots of really bad Haskell, including a fair bit that is a direct translation of 30-year old Pascal (see xindex in translit, Ch. 2, if you need to skip lunch). Programming really has advanced, you know.
Anyway, the URL is: http://www.crsr.net/Programming_Languages/SoftwareTools
Questions and comments would be appreciated, especially suggestions for how to make the code cleaner and more understandable. Flames and mockery are welcome, too, but only if they're funny---remember, I've been staring at Haskell, Pascal (plus my job-related Perl, CORBA, and C++) for a while; there's no telling what my mental state is like.
[I had intended to wait until I had the whole thing done to make this announcement, but I recently moved and have not made much forward progress since, other than putting what I had done online.]
-- Tommy M. McGuire
Some of those really look like they could be simpler, like 'copy' - couldn't that simply be 'main = interact (id)'? Have you seen http://haskell.org/haskellwiki/Simple_Unix_tools? For example, 'charcount' could be a lot simpler - 'charcount = showln . length' would work, wouldn't it, for the core logic, and the whole thing might look like:
main = do (print . showln . length) =<< getContents
Similarly wordcount could be a lot shorter, like 'wc_l = showln . length . lines' (showln is a convenience function: showln a = show a ++ "\n") I... I want to provide a one-liner for 'detab', but it looks impressively monstrous and I'm not sure I understand it. One final comment: as regards run-length encoding, there's a really neat way to do it. I wrote a little article on how to do it a while ago, so I guess I'll just paste it in here. :) --- Recently I was playing with and working on a clone of the old Gradius arcade games which was written in Haskell, Monadius http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Monadius-0.9. Most of my changes were not particularly interesting (cleaning up, Cabalizing, fixing warnings, switching all Integers to Ints and so on), but in its Demo.hs, I found an interesting solution to an interesting problem which seems to be a good example of how Haskell's abstractions can really shine. So, suppose we have these data items, which are levels which are specified by a pair of numbers and then a long list of numbers, often very repetitious. Perhaps a particular level might be represented this way:
level1 = ((Int,Int),[Int]) level1 = ((2,1),[0,0,0,0,0,0,0,0,0,0,0,0,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,73,73,73,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,69,69,69,69,69,69,65,17,17,17,17,17,17,17,17,17,17,17,17,17,25,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,1,1,1,1,1,1,1,1,1,1,9,9,9,1,1,1,1,1,1,1,1,1,1,1,1,1,65,65,65,65,65,65,1,1,1,1,1,1,1,1,33,33,1,1,1,1,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,49,49,49,49,33,33,33,1,1,1,1,1,1,1,9,9,9,1,1,1,1,1,1,1,1,1,33,33,33,33,1,1,1,1,1,1,1,1,1,9,9,1,1,1,1,1,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,1,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,81,81,81,81,81,81,81,81,81,81,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,1,1,1,1,1,1,1,17,17,17,17,17,17,17,17,17,17,17,1,1,1,1,1,1,1,1,1,1,1,1,17,17,17,1,1,1,1,1,1,1,1,1,1,1,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,97,33,33,33,33,37,5,69,69,65,65,65,65,65,65,67,67,67,67,67,67,67,67,67,75,75,75,75,75,75,75,75,75,75,75,75,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,3,3,3,3,3,3,3,3,3,3,3,3,3,11,11,3,3,3,3,3,3,3,3,3,3,11,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,67,67,67,67,67,67,3,3,3,67,67,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,11,11,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,3,3,3,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,43,43,43,43,43,43,43,43,43,43,43,43,43,43,43,11,11,11,11,11,43,43,43,43,43,43,43,43,35,35,35,35,35,35,35,35,3,3,3,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,35,35,35,35,35,35,35,35,35,3,3,3,3,35,35,35,35,35,35,3,3,3,3,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,19,19,19,19,19,19,3,3,3,3,19,19,19,19,3,3,3,3,3,3,19,19,19,19,3,3,3,3,3,3,3,19,19,19,19,19,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,83,83,83,83,83,67,67,67,3,3,19,19,19,19,19,19,19,19,19,19,19,19,83,83,83,83,83,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,83,83,83,83,83,83,83,83,83,19,19,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,75,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,43,35,35,35,35,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,19,19,19,19,19,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,3,3,3,3,3,3,3,3,67,67,67,67,67,67,83,83,83,83,83,83,83,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,19,51,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,35,35,43,43,43,43,43,43,43,43,43,43,43,11,11,11,11,11,11,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,43,43,43,43,43,11,11,11,11,11,11,11,3,3,67,67,67,67,67,83,19,19,3,3,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
This is clearly a bad way of representing things. We could just scrap this representation as Ints completely and perhaps define it as
-- Assume that these datatypes are already defined and everything level1 :: ((Geometry,Geometry),[Enemies]) level1 = ((Tall,Narrow), > > [FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,Shooter,PowerUp,Boss..]) > -- and so on
But this representation, while certainly more symbolic, is still very repetitious. We need some way of expressing this more concisely, of, in short, *compressing* it. In this vein of thought, our first observation should be that we do not need to resort to Gzipping it or adding dependencies on fancy compression libraries or anything; there is a very obvious way to compress it already - the entire thing is practically just a series of repeated numbers. We should be able to replace the length enumeration of [0,0,0,0,0,0,0,0,0,0,0,0] with something simpler, say the number of repetition and what is to be repeated so that our entry would look like (12,0). This representation is definitely shorter and more importantly, easier to modify and not a constant. It is possible that there may be a performance benefit here, as we've gotten rid of a large constant that would have to be defined in the program itself and instead replaced it with a shorter function which evaluates to the same thing. So, what is the type of our decompressing function? Well, we need to turn a (Int,Int) into a [Int]; even better, we want to turn a whole list of (Int,Int)s into a single list of [Int]. Thus, our end goal is going to be a function of this type:
rleDecode :: [(Int,Int)] -> [Int]
Let us tackle the single tuple example first. The second entry defines what we need, and the first entry defines how many we need. We could write a recursive function that takes the parameter, decreases the first entry by one, and cons on one example of the second entry. It could look like this:
rleRecursiveDecode :: (Int,Int) -> [Int] rleRecursiveDecode (0,_) = [] rleRecursiveDecode (n,b) = b : (rleRecursiveDecode (n-1,b))
But this really is not the best way to go. It is not necessarily easy to follow, and if there is one thing I have learned about Haskell programming, it is that the most obvious approach (in this case, primitive recursion) may not be the best way to go. This is code that could have as well been written in Scheme or something. It is complicated because we are trying to ensure that we generate an item for the list only at the exact moment we need to add it into the list; we are programming as if our language is strict, in other words. So, what is the lazy way of doing things? Infinite lists, of course. We create an infinite list containing only the second entry, and we merely take as many as the first entry says we need. Simple enough!
-- Infinite list of our item of interest. reduplicate a = a:a
rleLazyDecode :: (Int,Int) -> [Int] rleLazyDecode (n,b) = take n (reduplicate b)
Now, reduplicate is a simple enough function to define, but it already has a definition in the standard libraries - 'cycle'. (I assume you know what 'take' is, but that is also fairly easy to define once you seen the need for it.) So:
rleLazyDecode (n,b) = take n (cycle b)
Might as well remove the parentheses:
rleLazyDecode (n,b) = take n $ cycle b
A satisfying, short, functional, and lazy one-liner. From here the definition of rleDecode is almost trivial: we extend it to a list of tuples by throwing in a map, and we turn the resulting list of lists into a single list by way of 'concat':
rleDecode ns = concat $ map rleLazyDecode ns
We can tweak this further, as 'concat . map' is a common enough idiom that there is a shortcut:
rleDecode ns = concatMap rleLazyDecode ns
Aw heck - make it points-free:
rleDecode = concatMap rleLazyDecode
And substitute in the rleLazyDecode definition:
rleDecode = concatMap (\(n,b) -> take n $ cycle b)
We could also write a version that omits the explicit lambda and naming of parameters by use of the helpful but somewhat esoteric 'uncurry' function; Uncurry takes apart the tuple. Tts type is:
uncurry :: (a -> b -> c) -> (a, b) -> c
We can actually go even further into the realms of incomprehensibility. It turns out that lists are counter-intuitively a monad! This means we can use bind and all the rest of the operations defined by the Monad typeclass to operate on lists and other things. So we can write the bizarre (but short and effective) version of rleDecode that follows:
rleDecode = (uncurry replicate =<<)
And we are done! We can now represent the first list like this:
d1 = ((2,1),d) where d = rleDecode [(5, 3), (10, 67), (6, 3), (5, 67), (29, 3), (2, 11), (29, 3), (5, 35), (3, 3), (3, 35), (24, 3), (2, 35), (19, 3), (7, 19), (21, 51), (63, 35), (15, 43), (5, 11), (8, 43), (8, 35), (3, 3), (9, 35), (20, 3), (52, 67), (32, 3), (13, 35), (3, 3), (11, 35), (5, 3), (9, 35), (4, 3), (6, 35), (4, 3), (5, 35), (13, 3), (14, 19), (4, 3), (4, 19), (6, 3), (4, 19), (7, 3), (5, 19), (8, 3), (8, 19), (5, 83), (3, 67), (2, 3), (12, 19), (5, 83), (17, 19), (9, 83), (2, 19), (5, 3), (20, 67), (1, 75), (38, 11), (1, 43), (4, 35), (7, 3), (57, 67), (2, 3), (5, 19), (17, 3), (6, 19), (8, 3), (6, 67), (7, 83), (59, 3), (9, 19), (1, 51), (17, 35), (20, 3), (14, 35), (7, 3), (2, 35), (11, 43), (6, 11), (7, 3), (26, 67), (7, 3), (13, 35), (5, 43), (7, 11), (2, 3), (5, 67), (1, 83), (2, 19), (2, 3), (10, 67), (21, 3), (147, 0)]
Much nicer, don't you think? And the best part is, we should be able to reuse this run-length decoding even if we replace the arbitrary numbers by more descriptive type constructors! (Many thanks to the good denizens of #haskell, and Don Stewart's blog entry on run-length encoding/decoding using Arrows http://cgi.cse.unsw.edu.au/~dons/blog/2007/07.) -- gwern

Gwern Branwen wrote:
Some of those really look like they could be simpler, like 'copy' - couldn't that simply be 'main = interact (id)'?
Have you seen http://haskell.org/haskellwiki/Simple_Unix_tools?
For example, 'charcount' could be a lot simpler - 'charcount = showln . length' would work, wouldn't it, for the core logic, and the whole thing might look like:
main = do (print . showln . length) =<< getContents
Similarly wordcount could be a lot shorter, like 'wc_l = showln . length . lines'
(showln is a convenience function: showln a = show a ++ "\n")
Yes, that's absolutely true, and I am adding a section showing implementations based on interact as soon as I send this message. The reason I didn't do so before is that I was trying to (to an extent) preserve the structure of the original implementations, which means using an imperative style. Strangely, I have considerably more confidence in the imperative-ish Haskell code than I do in the imperative Pascal code, in spite of the fact that they are essentially the same. Probably this is due to the referential transparency that monadic IO preserves and that does not even enter into the picture in traditional Pascal. For example, the pseudo-nroff implementation has a giant, horrible block of a record (containing the state taken directly from K&P) that is threaded through the program, but I am tolerably happy with it because I know that is the *only* state going through the program. Further, while interact could probably handle all of the filter-style programs (and if I understand correctly, could also work for the main loop of the interactive editor) and a similar function could handle the later file-reading programs, I do not see how to generalize that to the out-of-core sort program. (Plus, interact is scary. :-D )
I... I want to provide a one-liner for 'detab', but it looks impressively monstrous and I'm not sure I understand it.
If you think that's bad.... :-) detab is one of the programs I do not like. I kept the "direct translation" approach up through that, but I think it really hides the simplicity there; detab copies its input to its output replacing tabs with 1-8 spaces, based on where the tab occurs in a line. The only interesting state dealt with is the count of characters on each line, but that gets hidden, not emphasized. On the other hand, I'm not looking for one-liners; I really want clarity as opposed to cleverness.
One final comment: as regards run-length encoding, there's a really neat way to do it. I wrote a little article on how to do it a while ago, so I guess I'll just paste it in here. :)
That *is* neat. -- Tommy M. McGuire mcguire@crsr.net

Hi
main = do (print . showln . length) =<< getContents where showln a = show a ++ "\n"
This can be written better. print puts a newline at the end and does a show, so lets remove that bit: main = do (print . length) =<< getContents Now we aren't using do notation, despite having a do block, and the brackets are redundant: main = print . length =<< getContents Much nicer :-) Thanks Neil

Hi Having got to the word counting example on the website: wordcount :: IO () wordcount = do wc <- wordcount' False 0 putStrLn (show wc) where wordcount' inword wc = do ch <- getc case ch of Nothing -> return wc Just c -> handlechar c wc inword handlechar c wc _ | (c == ' ' || c == '\n' || c == '\t') = wordcount' False wc handlechar _ wc False = wordcount' True $! wc + 1 handlechar _ wc True = wordcount' True wc Eeek. That's uglier than the C version, and has no abstract components. A much simpler version: main = print . length . words =<< getContents Beautiful, specification orientated, composed of abstract components. Code doesn't get much more elegant than that. Plus it also can be made to outperform C (http://www-users.cs.york.ac.uk/~ndm/supero/) Thanks Neil

ndmitchell:
Hi
Having got to the word counting example on the website:
wordcount :: IO () wordcount = do wc <- wordcount' False 0 putStrLn (show wc) where wordcount' inword wc = do ch <- getc case ch of Nothing -> return wc Just c -> handlechar c wc inword handlechar c wc _ | (c == ' ' || c == '\n' || c == '\t') = wordcount' False wc handlechar _ wc False = wordcount' True $! wc + 1 handlechar _ wc True = wordcount' True wc
Eeek. That's uglier than the C version, and has no abstract components.
A much simpler version:
main = print . length . words =<< getContents
Beautiful, specification orientated, composed of abstract components.
My thoughts too when reading the initial post was that it was all very low level imperative programming. Not of the Haskell flavour. -- Don

Don Stewart wrote:
My thoughts too when reading the initial post was that it was all very low level imperative programming. Not of the Haskell flavour.
-- Don
Oh, heck yeah. As I was thinking when I was translating it, "I can't even say I'm writing Pascal code using Haskell; I wouldn't write Pascal code this way." (IIRC, the xindex in translit that I mentioned uses several flag values in-band and I couldn't detangle the mess to figure them out, so I copied it as-is. Ick.) -- Tommy M. McGuire mcguire@crsr.net

On Wed, 12 Dec 2007, Don Stewart wrote:
ndmitchell:
A much simpler version:
main = print . length . words =<< getContents
Beautiful, specification orientated, composed of abstract components.
My thoughts too when reading the initial post was that it was all very low level imperative programming. Not of the Haskell flavour.
I remember there was a discussion about how to implement full 'wc' in an elegant but maximally lazy form, that is counting bytes, words and lines in one go. Did someone have a nice idea of how to compose the three counters from implementations of each counter? I'm afraid one cannot simply use the "split and count fragments" trick then.

On Dec 14, 2007 9:29 AM, Henning Thielemann
I remember there was a discussion about how to implement full 'wc' in an elegant but maximally lazy form, that is counting bytes, words and lines in one go. Did someone have a nice idea of how to compose the three counters from implementations of each counter? I'm afraid one cannot simply use the "split and count fragments" trick then.
Could you turn the folds into scans and use zip3 and last? I.e., something like this: data Triple a b c = Triple !a !b !c deriving Show countChars :: String -> [Int] countChars = scanl (\n _ -> n+1) 0 countChar :: Char -> String -> [Int] countChar c = scanl (\n c' -> if c == c' then n+1 else n) 0 countLines = countChar '\n' countWords = countChar ' ' last' [x] = x last' (x:xs) = x `seq` last' xs zip3' (x:xs) (y:ys) (z:zs) = Triple x y z : zip3' xs ys zs zip3' _ _ _ = [] wc :: String -> Triple Int Int Int wc xs = last' $ zip3' (countChars xs) (countWords xs) (countLines xs) main = print . wc =<< getContents (or use Data.Strict.Tuple -- but that only has pairs and no zip...) - Benja

Benja Fallenstein wrote:
Henning Thielemann wrote:
I remember there was a discussion about how to implement full 'wc' in an elegant but maximally lazy form, that is counting bytes, words and lines in one go. Did someone have a nice idea of how to compose the three counters from implementations of each counter? I'm afraid one cannot simply use the "split and count fragments" trick then.
Well, you could rely on catamorphism fusion (foldr f1 x1, foldr f2 x2) = foldr (f1 *** f2) (x1,x2) but that's not so compositional.
Could you turn the folds into scans and use zip3 and last? I.e., something like this:
This approach is really clever!
data Triple a b c = Triple !a !b !c deriving Show
countChars :: String -> [Int] countChars = scanl (\n _ -> n+1) 0
countChar :: Char -> String -> [Int] countChar c = scanl (\n c' -> if c == c' then n+1 else n) 0
countLines = countChar '\n' countWords = countChar ' '
last' [x] = x last' (x:xs) = x `seq` last' xs
zip3' (x:xs) (y:ys) (z:zs) = Triple x y z : zip3' xs ys zs zip3' _ _ _ = []
zipWith3 Triple
wc :: String -> Triple Int Int Int wc xs = last' $ zip3' (countChars xs) (countWords xs) (countLines xs)
main = print . wc =<< getContents
(or use Data.Strict.Tuple -- but that only has pairs and no zip...)
Slightly simplified (uses BangPatterns): import Data.List scanl' :: (b -> a -> b) -> b -> [a] -> [a] scanl' f !b [] = [b] scanl' f !b (x:xs) = b:scanl' (f b x) xs counts :: (a -> Bool) -> [a] -> [Int] counts p = scanl' (\n c -> if p c then n+1 else n) 0 wc :: String -> (Int,Int,Int) wc = last $ zip3 (charc xs) (wordc xs) (linec xs) where charc = counts (const True) wordc = counts (== ' ') linec = counts (== '\n') The scanl' basically ensures that the forcing the resulting list spine automatically forces the elements. This makes sense to do early and we can use normal list functions after that. Regards, apfelmus

Here's a version with cleaner separation between pure & IO:
main = interact $ show . length . words
- Conal
On Dec 12, 2007 11:12 AM, Neil Mitchell
Hi
Having got to the word counting example on the website:
wordcount :: IO () wordcount = do wc <- wordcount' False 0 putStrLn (show wc) where wordcount' inword wc = do ch <- getc case ch of Nothing -> return wc Just c -> handlechar c wc inword handlechar c wc _ | (c == ' ' || c == '\n' || c == '\t') = wordcount' False wc handlechar _ wc False = wordcount' True $! wc + 1 handlechar _ wc True = wordcount' True wc
Eeek. That's uglier than the C version, and has no abstract components.
A much simpler version:
main = print . length . words =<< getContents
Beautiful, specification orientated, composed of abstract components. Code doesn't get much more elegant than that. Plus it also can be made to outperform C (http://www-users.cs.york.ac.uk/~ndm/supero/http://www-users.cs.york.ac.uk/%7Endm/supero/ )
Thanks
Neil _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Tommy M McGuire wrote:
(Plus, interact is scary. :-D )
You have a scary feeling for a moment, then it passes. ;)
Gwern Branwen wrote:
I... I want to provide a one-liner for 'detab', but it looks impressively monstrous and I'm not sure I understand it.
On the other hand, I'm not looking for one-liners; I really want clarity as opposed to cleverness.
tabwidth = 4 -- tabstop !! (col-1) == there is a tabstop at column col -- This is an infinite list, so no need to limit the line width tabstops = map (\col -> col `mod` tabwidth == 1) [1..] -- calculate spaces needed to fill to the next tabstop in advance tabspaces = snd $ mapAccumR addspace [] tabstops addspace cs isstop = let cs'=' ':cs in (if isstop then [] else cs',cs') main = interact $ unlines . map detabLine . lines where detabLine = concat $ zipWith replace tabspaces replace cs '\t' = cs -- replace with adequate number of spaces replace _ char = [char] -- pass through How about that? Regards, apfelmus

apfelmus wrote:
Tommy M McGuire wrote:
(Plus, interact is scary. :-D )
You have a scary feeling for a moment, then it passes. ;)
tabwidth = 4
-- tabstop !! (col-1) == there is a tabstop at column col -- This is an infinite list, so no need to limit the line width tabstops = map (\col -> col `mod` tabwidth == 1) [1..]
-- calculate spaces needed to fill to the next tabstop in advance tabspaces = snd $ mapAccumR addspace [] tabstops addspace cs isstop = let cs'=' ':cs in (if isstop then [] else cs',cs')
Are you using mapAccumR (mapAccumR? (!)) to share space among the space strings? If so, wouldn't this be better: tabstops = map (\col -> col `mod` tabwidth == 1) [1..tabwidth] tabspaces = cycle $ snd $ mapAccumR addspace [] tabstops On the other hand, wouldn't this make for less head scratching: tabspaces = map (\col -> replicate (spacesFor col) ' ') [1..] where spacesFor col = tabwidth - ((col - 1) `mod` tabwidth)
main = interact $ unlines . map detabLine . lines where detabLine = concat $ zipWith replace tabspaces
I think you mean "concat . zipWith...". (You're doing this from memory, aren't you?)
replace cs '\t' = cs -- replace with adequate number of spaces replace _ char = [char] -- pass through
How about that?
It doesn't produce the same output, although I almost like it enough not
to care:
$ od -a test
0000000 ht c o l sp 1 ht 2 ht 3 4 ht r e s t
0000020 nl
0000021
$ runhaskell detab.hs

Tommy M McGuire wrote:
apfelmus wrote:
tabwidth = 4
-- tabstop !! (col-1) == there is a tabstop at column col -- This is an infinite list, so no need to limit the line width tabstops = map (\col -> col `mod` tabwidth == 1) [1..]
-- calculate spaces needed to fill to the next tabstop in advance tabspaces = snd $ mapAccumR addspace [] tabstops addspace cs isstop = let cs'=' ':cs in (if isstop then [] else cs',cs')
Are you using mapAccumR (mapAccumR? (!)) to share space among the space strings?
Sharing is a good idea! But mapAccumR has nothing to do with it, I just used it to encode the recursion, as replacement for a fold so to speak.
If so, wouldn't this be better:
tabstops = map (\col -> col `mod` tabwidth == 1) [1..tabwidth] tabspaces = cycle $ snd $ mapAccumR addspace [] tabstops
Yes. We can make the code even simpler :) tabspaces = cycle . init . tails . replicate tabwidth $ ' ' and the tabstops list is gone.
On the other hand, wouldn't this make for less head scratching:
tabspaces = map (\col -> replicate (spacesFor col) ' ') [1..] where spacesFor col = tabwidth - ((col - 1) `mod` tabwidth)
Yes and no. The very idea of introducing the tabspaces list in the first place is to avoid explicit indices altogether, a single zipWith is responsible for aligning columns. So, it's only natural to avoid indices for the definition of tabspaces , too. A side effect of separating tabspaces from the main loop is that we can do all kind of irregular tabstop spacing or different fill characters and the like solely by changing this list.
main = interact $ unlines . map detabLine . lines where detabLine = concat $ zipWith replace tabspaces
I think you mean "concat . zipWith...". (You're doing this from memory, aren't you?)
Yes and yes :)
replace cs '\t' = cs -- replace with adequate number of spaces replace _ char = [char] -- pass through
How about that?
It doesn't produce the same output, [...] It's counting tabs before expanding rather than after?
Yes, I noticed it too late, it's so wrong (>_<) :) Here's a correct version: perLine f = interact $ unlines . map f . lines main = perLine (detabLine tabspaces) where detabLine _ [] = [] detabLine (w:ws) ('\t':cs) = detabLine (w:ws) (w ++ cs) detabLine (w:ws) (c :cs) = c:detabLine ws cs Or even main = interact $ detab tabspaces where detab _ [] = [] detab _ ('\n':cs) = '\n':detab tabspaces cs detab (w:ws) ('\t':cs) = detab (w:ws) (w ++ cs) detab (_:ws) (c :cs) = c:detab ws cs This can't be expressed with zip anymore since the alignment of the list of spaces and the text changes when encountering a tab. @dons: I guess that detab would probably be a very interesting (and even useful) study example for generalizing stream fusion, since it's more like concatMap than map . Regards, apfelmus

Since there are many useful per-line functions, do a little refactoring,
placing the following into a library:
perLine :: (String -> String) -> (String -> String)
perLine f = unlines . map f . lines
On Dec 12, 2007 12:43 PM, apfelmus
Tommy M McGuire wrote:
(Plus, interact is scary. :-D )
You have a scary feeling for a moment, then it passes. ;)
Gwern Branwen wrote:
I... I want to provide a one-liner for 'detab', but it looks impressively monstrous and I'm not sure I understand it.
On the other hand, I'm not looking for one-liners; I really want clarity as opposed to cleverness.
tabwidth = 4
-- tabstop !! (col-1) == there is a tabstop at column col -- This is an infinite list, so no need to limit the line width tabstops = map (\col -> col `mod` tabwidth == 1) [1..]
-- calculate spaces needed to fill to the next tabstop in advance tabspaces = snd $ mapAccumR addspace [] tabstops addspace cs isstop = let cs'=' ':cs in (if isstop then [] else cs',cs')
main = interact $ unlines . map detabLine . lines where detabLine = concat $ zipWith replace tabspaces replace cs '\t' = cs -- replace with adequate number of spaces replace _ char = [char] -- pass through
How about that?
Regards, apfelmus
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Tommy,
detab is one of the programs I do not like. I kept the "direct translation" approach up through that, but I think it really hides the simplicity there; detab copies its input to its output replacing tabs with 1-8 spaces, based on where the tab occurs in a line. The only interesting state dealt with is the count of characters on each line, but that gets hidden, not emphasized.
On the other hand, I'm not looking for one-liners; I really want clarity as opposed to cleverness.
I would do a simple, imperative feeling detab using a recursive [Char] processing function: detab :: Int -> String -> String detab width text = detab' width text where detab' tab [] = [] detab' tab ('\n' : text) = '\n' : detab' width text detab' tab ('\t' : text) = replicate tab ' ' ++ detab' width text detab' 1 (char : text) = char : detab' width text detab' tab (char : text) = char : detab' (tab-1) text main = interact (detab 4) In Haskell, using IO all over the place is the opposite of clarity, even in imperative feeling code wich basically encodes a main loop. Tillmann

Another version of detab: main = interact $ perLine $ concat . snd. mapAccumL f 0 where f tab '\t' = (0, replicate (4-tab) ' ') f tab char = ((tab+1) `mod` 4, [char]) - Benja

On Dec 13, 2007 2:20 AM, Benja Fallenstein
Another version of detab:
main = interact $ perLine $ concat . snd. mapAccumL f 0 where f tab '\t' = (0, replicate (4-tab) ' ') f tab char = ((tab+1) `mod` 4, [char])
Although on reflection, I think I might like the following compromise with Tillmann's version best: main = interact $ perLine $ detab 0 where detab tab ('\t':cs) = replicate (4-tab) ' ' ++ detab 0 cs detab tab (char:cs) = char : detab ((tab+1) `mod` 4) cs detab tab "" = "" - Benja

On Dec 13, 2007 2:28 AM, Benja Fallenstein
Although on reflection, I think I might like the following compromise with Tillmann's version best:
main = interact $ perLine $ detab 0 where detab tab ('\t':cs) = replicate (4-tab) ' ' ++ detab 0 cs detab tab (char:cs) = char : detab ((tab+1) `mod` 4) cs detab tab "" = ""
On more reflection, I wonder whether it would be worthwhile to have a library function for folds that work from both left *and* right: foldlr :: (a -> b -> c -> (a,c)) -> a -> c -> [b] -> (a,c) foldlr f l r [] = (l,r) foldlr f l r (x:xs) = let (l',r') = f l x r''; (l'',r'') = foldlr f l' r xs in (l'',r') main = interact $ perLine $ snd . foldlr detab 0 "" where detab tab '\t' cs = (0, replicate (4-tab) ' ' ++ cs) detab tab char cs = ((tab+1) `mod` 4, char : cs) It's a fun function, anyway :-) - Benja

On 2007.12.12 12:51:58 -0600, Tommy M McGuire
Gwern Branwen wrote:
Some of those really look like they could be simpler, like 'copy' - couldn't that simply be 'main = interact (id)'? Have you seen http://haskell.org/haskellwiki/Simple_Unix_tools? For example, 'charcount' could be a lot simpler - 'charcount = showln . length' would work, wouldn't it, for the core logic, and the whole thing might look like:
main = do (print . showln . length) =<< getContents Similarly wordcount could be a lot shorter, like 'wc_l = showln . length . lines' (showln is a convenience function: showln a = show a ++ "\n")
Yes, that's absolutely true, and I am adding a section showing implementations based on interact as soon as I send this message. The reason I didn't do so before is that I was trying to (to an extent) preserve the structure of the original implementations, which means using an imperative style.
Yes, I'm looking at it now. Pretty nice.
Strangely, I have considerably more confidence in the imperative-ish Haskell code than I do in the imperative Pascal code, in spite of the fact that they are essentially the same. Probably this is due to the referential transparency that monadic IO preserves and that does not even enter into the picture in traditional Pascal. For example, the pseudo-nroff implementation has a giant, horrible block of a record (containing the state taken directly from K&P) that is threaded through the program, but I am tolerably happy with it because I know that is the *only* state going through the program.
Further, while interact could probably handle all of the filter-style programs (and if I understand correctly, could also work for the main loop of the interactive editor)
If your editor is referentially transparent, I think. Something like ed or sed could be done, as long as you didn't implement any of the IO stuff (like ! for ed).
and a similar function could handle the later file-reading programs, I do not see how to generalize that to the out-of-core sort program.
Well, for out-of-core sort, someone several many months back posted a very neat solution using ByteStrings which iirc had performance as competitive as GNU sort's out-of-core sort.... [much searching later] Ah! Here we go: "[Haskell-cafe] External Sort and unsafeInterleaveIO" http://www.haskell.org/pipermail/haskell-cafe/2007-July/029156.html. I at least found it interesting.
(Plus, interact is scary. :-D )
It's not scary! It's neat!
I... I want to provide a one-liner for 'detab', but it looks impressively monstrous and I'm not sure I understand it.
If you think that's bad.... :-)
detab is one of the programs I do not like. I kept the "direct translation" approach up through that, but I think it really hides the simplicity there; detab copies its input to its output replacing tabs with 1-8 spaces, based on where the tab occurs in a line. The only interesting state dealt with is the count of characters on each line, but that gets hidden, not emphasized.
On the other hand, I'm not looking for one-liners; I really want clarity as opposed to cleverness.
Well, one-liners generally can be expanded to 2 or 3 lines if you want to add descriptive variable names. Better to start with a short version and expand it where unclear than have a long unclear one in the first place, right?
One final comment: as regards run-length encoding, there's a really neat way to do it. I wrote a little article on how to do it a while ago, so I guess I'll just paste it in here. :)
That *is* neat.
-- Tommy M. McGuire
Thanks. It took a while to write, but I never really found any place to put it up for other people to read. -- gwern GSM Submarine E. 510 ddnp building y friends RDI JCET
participants (11)
-
apfelmus
-
Benja Fallenstein
-
Bulat Ziganshin
-
Conal Elliott
-
Don Stewart
-
Gwern Branwen
-
gwern0@gmail.com
-
Henning Thielemann
-
Neil Mitchell
-
Tillmann Rendel
-
Tommy M McGuire