Re: [Haskell-beginners] Organizing cmd line flow

30 Jan 2013

      On Wednesday, 30. January 2013 17:16:37 Bryan Vicknair wrote:
...
I have an executable, which gets a file path from the command line, and
passes it to this function::
createDb :: FilePath -> IO ()
createDb fpath = do
    fileExists <- doesFileExist fpath
    if fileExists
       then putStrLn "File already exists"
       else do parDirExists <- parentDirExists fpath
               if parDirExists
                  then do con <- openCon (Config fpath)
                          create con
                          closeCon con
                          putStrLn $ "created db @ " ++ fpath
                       else putStrLn "parent dir doesn't exist"
2 checks: File exists? Parent dir exist?  But already, the code is quite
nested.
How can I refactor this so that a few if expressions that check an IO
action doesn't result in very deep nesting of the code?
Here is my take on that, but note that I am beginner myself and there is 
certainly a better solution.

What you want to achieve is to check filenames for certain conditions. With 
each check you want to report something if the check fails. You want to run 
the next check only when the previous check succeeded. If the filename is 
aleady out of the question, then there is no point in running other checks. In 
the end you want to get an "okay" or "not okay".

One way or another you will need a running Boolean-like state. Otherwise you 
could not tell if it is still sensible to run yet another check, and it would 
be difficult to get a final result. This state tells you whether there is 
still hope for the filename to be okay or not.

However such a state is not suitable for reporting. You cannot check the 
running state and report something if it is "not okay" (False). After the 
first failure, it will remain False and a reporting function may be tempted to 
report a wrong reason, assuming that a check has failed, where really one of 
the previous checks had failed. You need to distinguish between "False because 
a check failed" and "False because a previous check failed and this check 
wasn't run at all". You need the decision wether or not to report in the same 
place where you decde whether or not to run yet another check. You can do this 
e.g. in a spiced up check function.

Your checks all return IO Bool. So in order to chain things it is fortunate to 
let everything return an IO Bool. 

The reporting function could look like this:

report :: Bool -> String-> IO Bool
report b msg = do
        if b then return True
             else putStrLn msg >> return False

When it is invoked with False it prints the message, which gives you an IO(). 
This is "piped" into "return False" which makes the whole function return "IO 
False". So the state is basically just preserved (just changed from Bool to IO 
Bool)

The checks shall also return an IO Bool. Here is one of them:

doesFileNotExist :: Bool -> FilePath -> String -> IO Bool
doesFileNotExist b fpath msg=
        if b then do
                x <- doesFileExist fpath
                report (not x) msg
             else return False

When it is called with a False as first parameter it assumes a previous check 
had failed and runs no check itself. Otherwise it runs "doesFileExist" and if 
that fails it reports the failure. This function knows whether it returned 
False (actually IO False) because it didn't run any check or whether it did 
run a check and failed.

The other check is similar:

doesParentDirectoryExist :: Bool -> FilePath -> String -> IO Bool
doesParentDirectoryExist b fpath  msg = 
        if b then do
                x <- doesDirectoryExist $ fst $ splitFileName fpath
                report x msg
             else return False

If you glue it all together you get:
createDb :: FilePath -> IO Bool
createDb fpath = do
        a <- doesFileNotExist True fpath "File already exists"
        b <- doesParentDirectoryExist a fpath "Parent directory does not 
exist"
        return b

Not how the result from the first check "a" is passed to the second check.

It does what you want:

*Main> createDb "/tmp/foomatic-rip.log"
File already exists
False
*Main> createDb "/tmp/foomatic-rip.logx"
True
*Main> createDb "/tmps/foomatic-rip.logx"
Parent directory does not exist
False

-- 
Martin

Re: [Haskell-beginners] Organizing cmd line flow

Martin Drautzburg