On Wed, Oct 23, 2013 at 11:43 AM, Tyson Whitehead <twhitehead@gmail.com> wrote:

Hey All,

I wrote some quick code to change mediawiki preformatted text (adjacent lines begining with a ' ') to nowiki blocks (first line with a ' ' and rest enclosed in <nowiki>...</nowiki> quotes).

http://lpaste.net/94688

I frequently find myself pretty unsatisfied with this type of code though. Anyone have any rewrites that would help teach me how to do this more elegantly in the future.

My first reaction was: use function composition to get rid of all those variables. But this code is interesting because some intermediate results are used multiple times. Here is the dependency graph: http://i.imgur.com/Smid5VZ.png (source file: http://lpaste.net/94718).

Before tackling the organization, I did some trivial refactoring: http://lpaste.net/94717. Namely:

* import Data.Function (on) instead of redefining 'on'.

* import Data.Text (Text) to avoid qualified import for T.Text everywhere.

* Use the OverloadedStrings extension to say "hello" instead of T.pack "hello".

* Back off the indentation in 'update' by putting the first guard on a new line.

* Move 'update' to a separate definition so we know it doesn't need any extra variables from 'replace'.

* Use T.concat instead of multiple T.append calls, to make update much more readable. There's also the <> operator in Data.Monoid (added in base 4.6).

* Use map instead of fmap so it's clear we're working with a list. Not everyone will agree with me on this.

* Use newlines to group the steps of the process.

Also, be wary of partial functions (functions that fail for some inputs) like 'head' and 'fromJust'. They're fine for quick scripts, but not for code other people will be using. When a program crashes with "Prelude.head: empty list", it makes all of us look bad. Instead, favor pattern matching or functions like 'fromMaybe' and 'maybe'. On the other hand, 'head' and 'groupBy' are okay to use together because 'groupBy' returns lists that are all non-empty.

I'll say it again: partial functions are *fine* for quick scripts. When you are the only one running the program, you can save a lot of time by making assumptions about the input.

All that's left is the fun part: making 'replace' easier to understand. I think the key is to move the "Group lines according to classification" logic to a separate function. The other steps of 'replace' have simple code, so this should help a lot.

Hope this helps,

-Joey