Software Tools in Haskell

11 Dec 2007

      On 2007.12.10 13:52:41 -0600, Tommy McGuire 
scribbled 1.7K characters:
...
In the "if anyone is interested,..." department....
For reasons that remain unclear, early this fall I started translating
Brian W. Kernighan and P.J. Plaugher's classic _Software Tools in Pascal_
into Haskell.  I have completed most of it, up to the second part of
chapter 8 which presents a proto-m4 preprocessor.  I have the code online
including notes, comments, descriptions, and a few alternate approaches.
Attractions include:
* A fair gamut of the usual Unix suspects: proto-cat, proto-wc, proto-tr,
proto-compress, proto-ar, proto-grep, etc.
* A usable editor, if you consider a de-featured ed-alike to be usable.
* A simple monadic regular expression engine.
* Zippers, Parsec, the State monad, the StateT monad transformer, and other
attempts to sully Computing Science's brightest jewels.
* Lots and lots of really bad Haskell, including a fair bit that is a
direct translation of 30-year old Pascal (see xindex in translit, Ch. 2, if
you need to skip lunch).  Programming really has advanced, you know.
Anyway, the URL is:
  http://www.crsr.net/Programming_Languages/SoftwareTools
Questions and comments would be appreciated, especially suggestions for how
to make the code cleaner and more understandable.  Flames and mockery are
welcome, too, but only if they're funny---remember, I've been staring at
Haskell, Pascal (plus my job-related Perl, CORBA, and C++) for a while;
there's no telling what my mental state is like.
[I had intended to wait until I had the whole thing done to make this
announcement, but I recently moved and have not made much forward progress
since, other than putting what I had done online.]
--
Tommy M. McGuire
Some of those really look like they could be simpler, like 'copy' -
couldn't that simply be 'main = interact (id)'?

Have you seen http://haskell.org/haskellwiki/Simple_Unix_tools?

For example, 'charcount' could be a lot simpler - 'charcount = showln
. length' would work, wouldn't it, for the core logic, and the whole
thing might look like:
...
main = do (print . showln . length) =<< getContents
Similarly wordcount could be a lot shorter, like 'wc_l = showln .
length . lines'

(showln is a convenience function: showln a = show a ++ "\n")

I... I want to provide a one-liner for 'detab', but it looks
impressively monstrous and I'm not sure I understand it.

One final comment: as regards run-length encoding, there's a really
neat way to do it. I wrote a little article on how to do it a while
ago, so I guess I'll just paste it in here. :)

---

Recently I was playing with and working on a clone of the old Gradius
arcade games which was written in Haskell, Monadius
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Monadius-0.9.
Most of my changes were not particularly interesting (cleaning up,
Cabalizing, fixing warnings, switching all Integers to Ints and so
on), but in its Demo.hs, I found an interesting solution to an
interesting problem which seems to be a good example of how Haskell's
abstractions can really shine.

So, suppose we have these data items, which are levels which are
specified by a pair of numbers and then a long list of numbers, often
very repetitious. Perhaps a particular level might be represented this
way:
...
level1 = ((Int,Int),[Int])
level1 = ((2,1),[0,0,0,0,0,0,0,0,0,0,0,0,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,73,73,73,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,69,69,69,69,69,69,65,17,17,17,17,17,17,17,17,17,17,17,17,17,25,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,1,1,1,1,1,1,1,1,1,1,9,9,9,1,1,1,1,1,1,1,1,1,1,1,1,1,65,65,65,65,65,65,1,1,1,1,1,1,1,1,33,33,1,1,1,1,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,49,49,49,49,33,33,33,1,1,1,1,1,1,1,9,9,9,1,1,1,1,1,1,1,1,1,33,33,33,33,1,1,1,1,1,1,1,1,1,9,9,1,1,1,1,1,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,33,1,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,81,81,81,81,81,81,81,81,81,81,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,1,1,1,1,1,1,1,17,17,17,17,17,17,17,17,17,17,17,1,1,1,1,1,1,1,1,1,1,1,1,17,17,17,1,1,1,1,1,1,1,1,1,1,1,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,97,33,33,33,33,37,5,69,69,65,65,65,65,65,65,67,67,67,67,67,67,67,67,67,75,75,75,75,75,75,75,75,75,75,75,75,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,3,3,3,3,3,3,3,3,3,3,3,3,3,11,11,3,3,3,3,3,3,3,3,3,3,11,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,67,67,67,67,67,67,3,3,3,67,67,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,11,11,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,3,3,3,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,43,43,43,43,43,43,43,43,43,43,43,43,43,43,43,11,11,11,11,11,43,43,43,43,43,43,43,43,35,35,35,35,35,35,35,35,3,3,3,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,35,35,35,35,35,35,35,35,35,3,3,3,3,35,35,35,35,35,35,3,3,3,3,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,19,19,19,19,19,19,3,3,3,3,19,19,19,19,3,3,3,3,3,3,19,19,19,19,3,3,3,3,3,3,3,19,19,19,19,19,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,83,83,83,83,83,67,67,67,3,3,19,19,19,19,19,19,19,19,19,19,19,19,83,83,83,83,83,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,19,83,83,83,83,83,83,83,83,83,19,19,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,75,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,43,35,35,35,35,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,19,19,19,19,19,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,3,3,3,3,3,3,3,3,67,67,67,67,67,67,83,83,83,83,83,83,83,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,19,19,19,19,19,19,19,19,19,51,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,35,3,3,3,3,3,3,3,35,35,43,43,43,43,43,43,43,43,43,43,43,11,11,11,11,11,11,3,3,3,3,3,3,3,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,35,35,35,35,35,35,35,35,35,35,35,35,35,43,43,43,43,43,11,11,11,11,11,11,11,3,3,67,67,67,67,67,83,19,19,3,3,67,67,67,67,67,67,67,67,67,67,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
This is clearly a bad way of representing things. We could just scrap
this representation as Ints completely and perhaps define it as
...
-- Assume that these datatypes are already defined and everything
level1 :: ((Geometry,Geometry),[Enemies])
level1 = ((Tall,Narrow), > > [FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,Shooter,PowerUp,Boss..]) > -- and so on
But this representation, while certainly more symbolic, is still very
repetitious. We need some way of expressing this more concisely, of,
in short, *compressing* it. In this vein of thought, our first
observation should be that we do not need to resort to Gzipping it or
adding dependencies on fancy compression libraries or anything; there
is a very obvious way to compress it already - the entire thing is
practically just a series of repeated numbers. We should be able to
replace the length enumeration of [0,0,0,0,0,0,0,0,0,0,0,0] with
something simpler, say the number of repetition and what is to be
repeated so that our entry would look like (12,0). This representation
is definitely shorter and more importantly, easier to modify and not a
constant. It is possible that there may be a performance benefit here,
as we've gotten rid of a large constant that would have to be defined
in the program itself and instead replaced it with a shorter function
which evaluates to the same thing.

So, what is the type of our decompressing function? Well, we need to
turn a (Int,Int) into a [Int]; even better, we want to turn a whole
list of (Int,Int)s into a single list of [Int]. Thus, our end goal is
going to be a function of this type:
...
rleDecode :: [(Int,Int)] -> [Int]
Let us tackle the single tuple example first. The second entry defines
what we need, and the first entry defines how many we need. We could
write a recursive function that takes the parameter, decreases the
first entry by one, and cons on one example of the second entry. It
could look like this:
...
rleRecursiveDecode :: (Int,Int) -> [Int]
rleRecursiveDecode (0,_) = []
rleRecursiveDecode (n,b) = b : (rleRecursiveDecode (n-1,b))
But this really is not the best way to go. It is not necessarily easy
to follow, and if there is one thing I have learned about Haskell
programming, it is that the most obvious approach (in this case,
primitive recursion) may not be the best way to go. This is code that
could have as well been written in Scheme or something. It is
complicated because we are trying to ensure that we generate an item
for the list only at the exact moment we need to add it into the list;
we are programming as if our language is strict, in other words.

So, what is the lazy way of doing things? Infinite lists, of course.
We create an infinite list containing only the second entry, and we
merely take as many as the first entry says we need. Simple enough!
...
-- Infinite list of our item of interest.
reduplicate a = a:a
...
rleLazyDecode :: (Int,Int) -> [Int]
rleLazyDecode (n,b) = take n (reduplicate b)
Now, reduplicate is a simple enough function to define, but it already
has a definition in the standard libraries - 'cycle'. (I assume you
know what 'take' is, but that is also fairly easy to define once you
seen the need for it.) So:
...
rleLazyDecode (n,b) = take n (cycle b)
Might as well remove the parentheses:
...
rleLazyDecode (n,b) = take n $ cycle b
A satisfying, short, functional, and lazy one-liner. From here the
definition of rleDecode is almost trivial: we extend it to a list of
tuples by throwing in a map, and we turn the resulting list of lists
into a single list by way of 'concat':
...
rleDecode ns = concat $ map rleLazyDecode ns
We can tweak this further, as 'concat . map' is a common enough idiom
that there is a shortcut:
...
rleDecode ns = concatMap rleLazyDecode ns
Aw heck - make it points-free:
...
rleDecode = concatMap rleLazyDecode
And substitute in the rleLazyDecode definition:
...
rleDecode = concatMap (\(n,b) -> take n $ cycle b)
We could also write a version that omits the explicit lambda and
naming of parameters by use of the helpful but somewhat esoteric
'uncurry' function; Uncurry takes apart the tuple. Tts type is:
...
uncurry :: (a -> b -> c) -> (a, b) -> c
We can actually go even further into the realms of
incomprehensibility. It turns out that lists are counter-intuitively a
monad! This means we can use bind and all the rest of the operations
defined by the Monad typeclass to operate on lists and other things.
So we can write the bizarre (but short and effective) version of
rleDecode that follows:
...
rleDecode = (uncurry replicate =<<)
And we are done! We can now represent the first list like this:
...
d1 = ((2,1),d)
    where d = rleDecode [(5, 3), (10, 67), (6, 3), (5, 67), (29, 3), (2, 11), (29, 3), (5, 35), (3, 3), (3, 35), (24, 3), (2, 35), (19, 3), (7, 19), (21, 51), (63, 35), (15, 43), (5, 11), (8, 43), (8, 35), (3, 3), (9, 35), (20, 3), (52, 67), (32, 3), (13, 35), (3, 3), (11, 35), (5, 3), (9, 35), (4, 3), (6, 35), (4, 3), (5, 35), (13, 3), (14, 19), (4, 3), (4, 19), (6, 3), (4, 19), (7, 3), (5, 19), (8, 3), (8, 19), (5, 83), (3, 67), (2, 3), (12, 19), (5, 83), (17, 19), (9, 83), (2, 19), (5, 3), (20, 67), (1, 75), (38, 11), (1, 43), (4, 35), (7, 3), (57, 67), (2, 3), (5, 19), (17, 3), (6, 19), (8, 3), (6, 67), (7, 83), (59, 3), (9, 19), (1, 51), (17, 35), (20, 3), (14, 35), (7, 3), (2, 35), (11, 43), (6, 11), (7, 3), (26, 67), (7, 3), (13, 35), (5, 43), (7, 11), (2, 3), (5, 67), (1, 83), (2, 19), (2, 3), (10, 67), (21, 3), (147, 0)]
Much nicer, don't you think? And the best part is, we should be able
to reuse this run-length decoding even if we replace the arbitrary
numbers by more descriptive type constructors!

(Many thanks to the good denizens of #haskell, and Don Stewart's blog
entry on run-length encoding/decoding using Arrows
http://cgi.cse.unsw.edu.au/~dons/blog/2007/07.)

--
gwern

Gwern Branwen

Tommy M McGuire

Neil Mitchell

Neil Mitchell

Don Stewart

Tommy M McGuire

Henning Thielemann

Benja Fallenstein

apfelmus

Bulat Ziganshin

Conal Elliott

apfelmus

Tommy M McGuire

apfelmus

Conal Elliott

Tillmann Rendel

Benja Fallenstein

Benja Fallenstein

Benja Fallenstein

gwern0＠gmail.com

tags

participants (11)