
On Mon, 2008-06-30 at 12:04 +0200, Joachim Breitner wrote:
Hi,
for an application such as a image gallery generator, that works on a bunch of input files (that are assumed to be constant during one run of the program) and generates or updates a bunch of output files, I often had the problem of manually tracking what input files a certain output file depends on, to check the timestamps if it is necessary to re-create the file.
I thought a while how to do this with a monad that does the bookkeeping for me. Assuming it’s called ODIO (On demand IO), I’d like a piece of code like this:
do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" writeFileOD "someOutput" (someComplexFunction file1 file2)
only actually read "someInput" and "someOtherInput", do the calculation and write the output if these have newer time stamps than the output.
The problem I stumbled over was that considering the type of >>= (>>=): Monad m => m a -> (a -> m b) -> m b means that I can not „look ahead“ what files would be written without actually reading the requested file. Of course this is not always possible, although I expect this code to be the exception: do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" let filename = decideFileNamenameBasedOn file2 writeFileOD filename (someComplexFunction file1 file2)
But assuming that the input does not change during one run of the program, it should be safe to use "unsafeInterleaveIO" to only open and read the input when used. Then, the readFileOD could put the timestamp of the read file in a Monad-local state and the writeFileOD could, if the output is newer then all inputs listed in the state, skip the writing and thus the unsafeInterleaveIO’ed file reads are skipped as well, if they were not required for deciding the flow of the program.
One nice thing is that the implementation of (>>) knows that files read in the first action will not affect files written in the second, so in contrast to MonadState, we can forget about them, which I hope leads to quite good guesses as to what files are relevant for a certain writeFileOD operation. Also, a function cacheResultOD :: (Read a, Show a) => FilePath -> a -> ODIO a can be used to write an (expensive) intermediate result, such as the extracted exif information from a file, to disk, so that it can be used without actually re-reading the large image file.
Is that a sane idea?
I’m also considering to use this example for a talk about monads at the GPN¹ next weekend.
You may want to look at Magnus Carlsson's "Monads for Incremental Computing" http://citeseer.comp.nus.edu.sg/619122.html