A Monad for on-demand file generation?

Joachim Breitner

30 Jun 2008 30 Jun '08

10:04 a.m.

Hi, for an application such as a image gallery generator, that works on a bunch of input files (that are assumed to be constant during one run of the program) and generates or updates a bunch of output files, I often had the problem of manually tracking what input files a certain output file depends on, to check the timestamps if it is necessary to re-create the file. I thought a while how to do this with a monad that does the bookkeeping for me. Assuming it’s called ODIO (On demand IO), I’d like a piece of code like this: do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" writeFileOD "someOutput" (someComplexFunction file1 file2) only actually read "someInput" and "someOtherInput", do the calculation and write the output if these have newer time stamps than the output. The problem I stumbled over was that considering the type of >>= (>>=): Monad m => m a -> (a -> m b) -> m b means that I can not „look ahead“ what files would be written without actually reading the requested file. Of course this is not always possible, although I expect this code to be the exception: do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" let filename = decideFileNamenameBasedOn file2 writeFileOD filename (someComplexFunction file1 file2) But assuming that the input does not change during one run of the program, it should be safe to use "unsafeInterleaveIO" to only open and read the input when used. Then, the readFileOD could put the timestamp of the read file in a Monad-local state and the writeFileOD could, if the output is newer then all inputs listed in the state, skip the writing and thus the unsafeInterleaveIO’ed file reads are skipped as well, if they were not required for deciding the flow of the program. One nice thing is that the implementation of (>>) knows that files read in the first action will not affect files written in the second, so in contrast to MonadState, we can forget about them, which I hope leads to quite good guesses as to what files are relevant for a certain writeFileOD operation. Also, a function cacheResultOD :: (Read a, Show a) => FilePath -> a -> ODIO a can be used to write an (expensive) intermediate result, such as the extracted exif information from a file, to disk, so that it can be used without actually re-reading the large image file. Is that a sane idea? I’m also considering to use this example for a talk about monads at the GPN¹ next weekend. Greetings, Joachim ¹ http://entropia.de/wiki/GPN7 -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

Attachments:

signature.asc (application/pgp-signature — 197 bytes)

Show replies by date

Luke Palmer

30 Jun 30 Jun

10:11 a.m.

2008/6/30 Joachim Breitner :

...

The problem I stumbled over was that considering the type of >>= (>>=): Monad m => m a -> (a -> m b) -> m b means that I can not „look ahead" what files would be written without actually reading the requested file. Of course this is not always possible, although I expect this code to be the exception:

I am somewhat unclear about what you are asking. My first impression though is that if you're running into trouble with "looking ahead", then this algebra is probably not a Monad. In fact, these use cases indicate an applicative functor to me (Control.Applicative). Of course, the problem with applicative functors is that the syntax that goes with them is not as imperative as monad syntax, and syntax is a big motivator for finding monads. Luke

Derek Elkins

12:08 p.m.

On Mon, 2008-06-30 at 12:04 +0200, Joachim Breitner wrote:

...

Hi,

for an application such as a image gallery generator, that works on a bunch of input files (that are assumed to be constant during one run of the program) and generates or updates a bunch of output files, I often had the problem of manually tracking what input files a certain output file depends on, to check the timestamps if it is necessary to re-create the file.

I thought a while how to do this with a monad that does the bookkeeping for me. Assuming it’s called ODIO (On demand IO), I’d like a piece of code like this:

do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" writeFileOD "someOutput" (someComplexFunction file1 file2)

only actually read "someInput" and "someOtherInput", do the calculation and write the output if these have newer time stamps than the output.

The problem I stumbled over was that considering the type of >>= (>>=): Monad m => m a -> (a -> m b) -> m b means that I can not „look ahead“ what files would be written without actually reading the requested file. Of course this is not always possible, although I expect this code to be the exception: do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" let filename = decideFileNamenameBasedOn file2 writeFileOD filename (someComplexFunction file1 file2)

But assuming that the input does not change during one run of the program, it should be safe to use "unsafeInterleaveIO" to only open and read the input when used. Then, the readFileOD could put the timestamp of the read file in a Monad-local state and the writeFileOD could, if the output is newer then all inputs listed in the state, skip the writing and thus the unsafeInterleaveIO’ed file reads are skipped as well, if they were not required for deciding the flow of the program.

One nice thing is that the implementation of (>>) knows that files read in the first action will not affect files written in the second, so in contrast to MonadState, we can forget about them, which I hope leads to quite good guesses as to what files are relevant for a certain writeFileOD operation. Also, a function cacheResultOD :: (Read a, Show a) => FilePath -> a -> ODIO a can be used to write an (expensive) intermediate result, such as the extracted exif information from a file, to disk, so that it can be used without actually re-reading the large image file.

Is that a sane idea?

I’m also considering to use this example for a talk about monads at the GPN¹ next weekend.

You may want to look at Magnus Carlsson's "Monads for Incremental Computing" http://citeseer.comp.nus.edu.sg/619122.html

Joachim Breitner

9:15 p.m.

Hi, Am Montag, den 30.06.2008, 07:08 -0500 schrieb Derek Elkins:

...

You may want to look at Magnus Carlsson's "Monads for Incremental Computing" http://citeseer.comp.nus.edu.sg/619122.html

not exactly what I need, but very interesting read. Maybe I can use some of the ideas. Thanks, Joachim -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

Chung-chieh Shan

7 Jul 7 Jul

7:08 p.m.

Joachim Breitner wrote in article <1214860549.3232.8.camel@otto.ehbuehl.net> in gmane.comp.lang.haskell.cafe:

...

Am Montag, den 30.06.2008, 07:08 -0500 schrieb Derek Elkins:

...
You may want to look at Magnus Carlsson's "Monads for Incremental Computing" http://citeseer.comp.nus.edu.sg/619122.html not exactly what I need, but very interesting read. Maybe I can use some of the ideas.

You might also find relevant the work on "adaptive computation" by Umut Acar and collaborators. -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig 2008-07-04 Independence from America! http://caab.org.uk/ 2008-07-05 International Co-operative Day http://ica.coop/ http://www.guardian.co.uk/politics/2008/jul/02/labour.tradeunions

Ryan Ingram

30 Jun 30 Jun

11:54 p.m.

Some comments: 1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program). If you have the ability to store metadata about the computation along with the computation results, maybe that would be a better solution? 2) I agree with Luke that this "smells" more like an applicative functor. But getting to monad syntax is quite nice if you can do so. As an applicative functor you would have "writeFileOD :: Filename -> ODIO ByteString -> ODIO ()"; then writeFile can handle all the necessary figuring out of timestamps itself, and you get the bonus guarantee that the contents of the files read by the "ODIO ByteString" argument won't affect the filename you are going to output to. 3) Instead of (Read,Show), look into Data.Binary instead, if you actually care about efficiency. Parsing text at read time will almost never be faster than just performing the computation on the source data again. -- ryan On 6/30/08, Joachim Breitner wrote:

...

Hi,

for an application such as a image gallery generator, that works on a bunch of input files (that are assumed to be constant during one run of the program) and generates or updates a bunch of output files, I often had the problem of manually tracking what input files a certain output file depends on, to check the timestamps if it is necessary to re-create the file.

I thought a while how to do this with a monad that does the bookkeeping for me. Assuming it's called ODIO (On demand IO), I'd like a piece of code like this:

do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" writeFileOD "someOutput" (someComplexFunction file1 file2)

only actually read "someInput" and "someOtherInput", do the calculation and write the output if these have newer time stamps than the output.

The problem I stumbled over was that considering the type of >>= (>>=): Monad m => m a -> (a -> m b) -> m b means that I can not „look ahead" what files would be written without actually reading the requested file. Of course this is not always possible, although I expect this code to be the exception: do file1 <- readFileOD "someInput" file2 <- readFileOD "someOtherInput" let filename = decideFileNamenameBasedOn file2 writeFileOD filename (someComplexFunction file1 file2)

But assuming that the input does not change during one run of the program, it should be safe to use "unsafeInterleaveIO" to only open and read the input when used. Then, the readFileOD could put the timestamp of the read file in a Monad-local state and the writeFileOD could, if the output is newer then all inputs listed in the state, skip the writing and thus the unsafeInterleaveIO'ed file reads are skipped as well, if they were not required for deciding the flow of the program.

One nice thing is that the implementation of (>>) knows that files read in the first action will not affect files written in the second, so in contrast to MonadState, we can forget about them, which I hope leads to quite good guesses as to what files are relevant for a certain writeFileOD operation. Also, a function cacheResultOD :: (Read a, Show a) => FilePath -> a -> ODIO a can be used to write an (expensive) intermediate result, such as the extracted exif information from a file, to disk, so that it can be used without actually re-reading the large image file.

Is that a sane idea?

I'm also considering to use this example for a talk about monads at the GPN¹ next weekend.

Joachim Breitner

1 Jul 1 Jul

8:51 a.m.

Hi, thanks for your comments. Am Montag, den 30.06.2008, 16:54 -0700 schrieb Ryan Ingram:

...

1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).

Any other gotcha? Anyways, is this really worse than the similary lazy readFile? Using that would not safe the call to open, but at least the reading and processing, in the same situations.

...

If you have the ability to store metadata about the computation along with the computation results, maybe that would be a better solution?

Not sure what you mean here, sorry. Can you elaborate?

...

2) I agree with Luke that this "smells" more like an applicative functor. But getting to monad syntax is quite nice if you can do so. As an applicative functor you would have "writeFileOD :: Filename -> ODIO ByteString -> ODIO ()"; then writeFile can handle all the necessary figuring out of timestamps itself, and you get the bonus guarantee that the contents of the files read by the "ODIO ByteString" argument won't affect the filename you are going to output to.

I thought about this (without having the applicative abstraction in mind). This would then look like: main = do f1 <- readFileOD "infile1" f2 <- readFileOD "infile2" writeFileOD "outfile1" $ someFunc <$> f1 <*> f2 writeFileOD "outfile2" $ someOtherFunc <$> f1 right? Will it still work so that if both outfiles need to be generated, f1 is read only once?

...

3) Instead of (Read,Show), look into Data.Binary instead, if you actually care about efficiency. Parsing text at read time will almost never be faster than just performing the computation on the source data again.

I assume it’s still faster than, e.g., running an external program to read the exif tags, but you are right, Data.Binary is nicer for this. Thanks, Joachim -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

Ketil Malde

9:53 a.m.

Joachim Breitner writes:

...

...
1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).

...

Any other gotcha?

The one that springs to mind is that you might run out of file handles. At least on Linux, that's a precious resource. -k -- If I haven't seen further, it is by standing in the footprints of giants

Joachim Breitner

10:22 a.m.

Hi, Am Dienstag, den 01.07.2008, 11:53 +0200 schrieb Ketil Malde:

...

Joachim Breitner writes:

...
...
1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).

...
Any other gotcha?

The one that springs to mind is that you might run out of file handles. At least on Linux, that's a precious resource.

but at least then, (unsafeInterleaveIO readFile) is actually better than (readFile), because if I consume the files in sequence and complete, they will be opened and closed in sequence with the first one, but be opened all at once with the second. At least it won’t be worse, because the file will not be closed later, and possibly opened later. Greetings, Joachim -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

David Roundy

12:41 p.m.

On Tue, Jul 01, 2008 at 10:22:35AM +0000, Joachim Breitner wrote:

...

Hi,

Am Dienstag, den 01.07.2008, 11:53 +0200 schrieb Ketil Malde:

...
Joachim Breitner writes:

...
...
1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).

...
Any other gotcha?

The one that springs to mind is that you might run out of file handles. At least on Linux, that's a precious resource.

but at least then, (unsafeInterleaveIO readFile) is actually better than (readFile), because if I consume the files in sequence and complete, they will be opened and closed in sequence with the first one, but be opened all at once with the second. At least it won’t be worse, because the file will not be closed later, and possibly opened later.

Indeed, the best option (in my opinion) would be unsafeInterleaveIO readFileStrict (where you might need to write readFileStrict). In darcs, we use lazy IO a lot, but never lazily read a file, precisely due to the open file handle issue. This works pretty well, and your scenario is precisely the one in which unsafeInterleaveIO shines. David

Henning Thielemann

12:55 p.m.

On Tue, 1 Jul 2008, David Roundy wrote:

...

Indeed, the best option (in my opinion) would be

unsafeInterleaveIO readFileStrict

How about ByteString.readFile ? This is strict and efficient.

Ryan Ingram

9:52 p.m.

On 7/1/08, Joachim Breitner wrote:

...

Hi,

thanks for your comments.

Am Montag, den 30.06.2008, 16:54 -0700 schrieb Ryan Ingram:

...
1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).

Any other gotcha? Anyways, is this really worse than the similary lazy readFile? Using that would not safe the call to open, but at least the reading and processing, in the same situations.

Well, you're also (from your description) probably writing some tracking information to an IORef of some sort. That can happen in the middle of an otherwise pure computation, and it's difficult to know exactly when it'll get triggered, due to laziness. You can probably make it work :)

...

...
If you have the ability to store metadata about the computation along with the computation results, maybe that would be a better solution?

Not sure what you mean here, sorry. Can you elaborate?

Well, while doing the computation the first time, you can track what depends on what. Then you save *that* information out. Here's an example: main = runODIO $ do do bar <- readFileOD "bar.txt" baz <- readFileOD "baz.txt" let result = expensiveComputation bar baz writeFileOD "foo.bin" result do hat <- readFileOD "hat.txt" let result = otherComputation hat writeFileOD "foo2.bin" result Now, as you mentioned before, you know that the RHS of >> doesn't depend on the files read on the LHS. So the two "do" blocks here are independent. Now, if you run with no information, you run the whole computation, and you write out in your metadata "First we are going to build foo.bin from bar.txt and baz.txt, and then we build foo2.bin from hat.txt". Now when you get to the first "do" block, you know what computation is about to happen (since you've recorded it before), and can check the timestamps of foo.bin, bar.txt, and baz.txt, and potentially skip the whole thing. Of course now the metadata depends on the script itself, but you already had to deal with that problem :)

...

...
2) I agree with Luke that this "smells" more like an applicative functor. But getting to monad syntax is quite nice if you can do so. As an applicative functor you would have "writeFileOD :: Filename -> ODIO ByteString -> ODIO ()"; then writeFile can handle all the necessary figuring out of timestamps itself, and you get the bonus guarantee that the contents of the files read by the "ODIO ByteString" argument won't affect the filename you are going to output to.

I thought about this (without having the applicative abstraction in mind). This would then look like:

main = do f1 <- readFileOD "infile1" f2 <- readFileOD "infile2" writeFileOD "outfile1" $ someFunc <$> f1 <*> f2 writeFileOD "outfile2" $ someOtherFunc <$> f1

right?

Not exactly. Try this: writeFileOD "outfile1" (someFunc <$> readFileOD "infile1" <*> readFileOD "infile2") writeFileOD "outfile2" (someOtherFunc <$> readFIleOD "infile1") (or, equivalently, replace the "<-" with "let .. in" in your data).

...

Will it still work so that if both outfiles need to be generated, f1 is read only once?

That depends how you write it! Remember that you can write your applicative functor to just build up a graph of what computation might need to be done. You can then analyze that graph and look for sharing if necessary. If you want the sharing to be explicit, you need something a bit more monad-ish. If the type of "readFileOD" is "Filename -> ODIO (ODIO ByteString)" then your original syntax works and gives you a chance to pick up on the explicit sharing by labelling the result of "f1 <- ...". -- ryan

Joachim Breitner

9:57 p.m.

Hi, thanks again for you input. Just one small remark: Am Dienstag, den 01.07.2008, 14:52 -0700 schrieb Ryan Ingram:

...

On 7/1/08, Joachim Breitner wrote:

...
Am Montag, den 30.06.2008, 16:54 -0700 schrieb Ryan Ingram:

...
1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).

Any other gotcha? Anyways, is this really worse than the similary lazy readFile? Using that would not safe the call to open, but at least the reading and processing, in the same situations.

Well, you're also (from your description) probably writing some tracking information to an IORef of some sort. That can happen in the middle of an otherwise pure computation, and it's difficult to know exactly when it'll get triggered, due to laziness. You can probably make it work :)

Well, for the tracking information, I can do it purely, by copying code from StateT (or WriterT or ReaderT, I’m not sure :-)), and adapting slightly (e.g. the (>>) optimization). So besides unsafeInterleaveIO, no “bad, unpure stuff” should be necessary. I think I’ll put my ideas to code soon and post it here. Greetings, Joachim -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de ICQ#: 74513189 Jabber-ID: nomeata@joachim-breitner.de

Brandon S. Allbery KF8NH

9:59 p.m.

On 2008 Jul 1, at 17:52, Ryan Ingram wrote:

...

Well, you're also (from your description) probably writing some tracking information to an IORef of some sort. That can happen in the middle of an otherwise pure computation, and it's difficult to know exactly when it'll get triggered, due to laziness. You can probably make it work :)

...
...
If you have the ability to store metadata about the computation along with the computation results, maybe that would be a better solution?

Not sure what you mean here, sorry. Can you elaborate?

Well, while doing the computation the first time, you can track what depends on what. Then you save *that* information out. Here's an

This sounds suspiciously like Writer to me. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

ChrisK

2 Jul 2 Jul

3:43 p.m.

...

hen, the readFileOD could put the timestamp of the read file in a Monad-local state and the writeFileOD could, if the output is newer then all inputs listed in the state, skip the writing and thus the unsafeInterleaveIO’ed file reads are skipped as well, if they were not required for deciding the flow of the program.

How is your system similar to make/Makefile or different to make/Makefile ? Are your actions more restricted? Are the semantics more imperative? Are the dependencies still explicit or are them implicit and inferred? -- Chris

Joachim Breitner

3 Jul 3 Jul

1:55 p.m.

Hi, Am Mittwoch, den 02.07.2008, 16:43 +0100 schrieb ChrisK:

...

...
hen, the readFileOD could put the timestamp of the read file in a Monad-local state and the writeFileOD could, if the output is newer then all inputs listed in the state, skip the writing and thus the unsafeInterleaveIO’ed file reads are skipped as well, if they were not required for deciding the flow of the program.

How is your system similar to make/Makefile or different to make/Makefile ?

Are your actions more restricted? Are the semantics more imperative? Are the dependencies still explicit or are them implicit and inferred?

I think the biggest difference is that with Make, you have to explicitly list all dependencies, which is what I want to avoid by having the Monad keep record of the used files. So it’s mostly a convenience thingy, altough a monad would be generally more flexible, e.g. deciding the output file name based on some content of the some of the input files. I have some code that I’ll put somewhere soon. Greetings, Joachim -- Joachim Breitner e-Mail: mail@joachim-breitner.de Homepage: http://www.joachim-breitner.de ICQ#: 74513189 Jabber-ID: nomeata@joachim-breitner.de

Joachim Breitner

3 p.m.

Hi, Am Donnerstag, den 03.07.2008, 15:55 +0200 schrieb Joachim Breitner:

...

I have some code that I’ll put somewhere soon.

http://darcs.nomeata.de/odio/ODIO.hs now contains a simple implementation of the idea, together with more explanation. To show what the effect is, I wrote a very small program: 1> main = runODIO $ do 2> c1 <- readFileOD' "inFile1" 3> c2 <- readFileOD' "inFile2" 4> c3 <- readFileOD' "inFile3" 5> liftIO $ putStrLn "Some output" 6> writeFileOD' "outFile1" (show (length c1 + length c2)) 7> c4 <- readFileOD' "inFile4" 8> writeFileOD' "outFile2" (show (length c1 + length c3 + length c4)) 9> time <- liftIO $ getClockTime A> writeFileOD' "outFile3" (show time ++ c1) and a script that runs this under various conditions http://darcs.nomeata.de/odio/demo.sh with the output available at http://darcs.nomeata.de/odio/demo.out. Note that the primes after the function calls are just for the verbose variant for demonstration. Some points to emphasize (you can verify them in the demo output). * The 9th line runs an arbitary IO action, so from then on, ODIO can’t do anything else but to actually write out every file it should. * The 5th line does not have this effect. Because this gets desugared to (>>), the special implementation of (>>) means that the next line still sees the same dependency state as the before the call to liftIO. * A change to inFile3 causes outFile1 to be re-written, although from looking at the code, _we_ know that this is not necessary, but the ODIO monad can not tell. The programmer should have swapped the lines. * A change only to inFile4 means that outFile1 will not have to generated, and thanks to lazyness and unsafeInterleaveIO, inFile2 will not even opened. Some additions that might be necessary for real world use: * ByteString interface * a variant of readFileOD with type “FilePath -> IO a -> ODIO a” if, instead of reading the file directly, you want to call some external parsing helper (e.g. to read exif data). * A even more verbose mode that tells you why exactly a write action has to be done. This is why I keep a list of Files and Timestamps around. I hope this is a basis for even more discussion, and of course http://darcs.nomeata.de/odio/ is a darcs repository, so feel free to send patches. Greetings, Joachim -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

ChrisK

6:09 p.m.

Joachim Breitner wrote:

...

* The 5th line does not have this effect. Because this gets desugared to (>>), the special implementation of (>>) means that the next line still sees the same dependency state as the before the call to liftIO.

You are violating the monad laws. (f >> k) and (f >>= \_ -> k) should do the same thing. You might write a version of liftIO that has the effect you want, however.

...

* A change to inFile3 causes outFile1 to be re-written, although from looking at the code, _we_ know that this is not necessary, but the ODIO monad can not tell. The programmer should have swapped the lines.

Let me reverse engineer your algorithm (aside from the screwy >>): Every readFile that is encountered in processing ODIO is added to a list of source files. The reading deferred to be lazy with unsafePerformIO. When a writeFile is encountered it is assumed to depend on all previously read files. If this output file already exists and is newer than all the source files, then writing it is skipped (and perhaps also the lazy reads are skipped). Otherwise, the writing is strict. ---- I would say this is an unusual module. I rather prefer Makefile semantics, which could be improved in some ways by using a DSL in Haskell instead. The syntactic form of a file-oriented Makefile declaration is output : input1 input2 shell script more shell script And the "shell script" has access to the output file name, and also has access to the input names. In Haskell you could have a monadic DSL where the output name (and perhaps some explicit input names) are accessible like MonadReader. The result of running the DSL would do no IO at all but, much like a compiler, would return an IO action (the program to create the output file) and a list of inferred dependencies (an improvement over the Makefile syntax). Even if the DSL does not allow liftIO, it can still compile to various IO actions. Then you have a map from (outputname) to (dependencies,ioAction). And when outputname is demanded you can walk the dependencies to see if the timestamps are newer or older, using the ioActions to create the desired files. So perhaps to run the DSL monad you have a function like: makeRule :: DSL () -> FilePath -> [FilePath] -> ( [FilePath], IO () ) type Depends = Map FilePath ([FilePath], IO ()) demand :: Depends -> FilePath -> Maybe ByteString

David Roundy

6:35 p.m.

On Thu, Jul 03, 2008 at 07:09:58PM +0100, ChrisK wrote:

...

Joachim Breitner wrote:

...
* The 5th line does not have this effect. Because this gets desugared to (>>), the special implementation of (>>) means that the next line still sees the same dependency state as the before the call to liftIO.

You are violating the monad laws. (f >> k) and (f >>= \_ -> k) should do the same thing. You might write a version of liftIO that has the effect you want, however.

I don't mind a little anarchy in the monad laws... :)

...

...
* A change to inFile3 causes outFile1 to be re-written, although from looking at the code, _we_ know that this is not necessary, but the ODIO monad can not tell. The programmer should have swapped the lines.

Let me reverse engineer your algorithm (aside from the screwy >>):

Every readFile that is encountered in processing ODIO is added to a list of source files. The reading deferred to be lazy with unsafePerformIO.

When a writeFile is encountered it is assumed to depend on all previously read files. If this output file already exists and is newer than all the source files, then writing it is skipped (and perhaps also the lazy reads are skipped). Otherwise, the writing is strict.

----

I would say this is an unusual module. I rather prefer Makefile semantics, which could be improved in some ways by using a DSL in Haskell instead.

The syntactic form of a file-oriented Makefile declaration is

output : input1 input2 shell script more shell script

I must say that I prefer the automatic computation of dependencies as outlined by Joachim. A bit more is needed, of course, to enable true Makefile-like dependency handling, since you'd want to ensure that the dependencies themselves are up-to-date, but the automatic computation of dependencies would be a real boon. Makefiles are extremely prone to errors in which dependencies are left out, and those bugs are only caught on rare occasions when the build is performed in an unusual order, or when a rarely-touched source file is edited. Of course, to create a "make" replacement, you'd also have to be able to call external programs and track which files they use, which is a hard problem, particularly as which files they use may depend on the contents of the files that they use. One could, however, lift calls to well-behaved external programs (e.g. those like ghc or gcc that can output their dependencies) into this sort of monad. David

Joachim Breitner

9:37 p.m.

Hi, Am Donnerstag, den 03.07.2008, 11:35 -0700 schrieb David Roundy:

...

On Thu, Jul 03, 2008 at 07:09:58PM +0100, ChrisK wrote:

...
Joachim Breitner wrote: You are violating the monad laws. (f >> k) and (f >>= \_ -> k) should do the same thing. You might write a version of liftIO that has the effect you want, however.

I don't mind a little anarchy in the monad laws... :)

It depends on what level you want them to be true. Assuming the rest of the code is correct, the only difference that (f >> k) from (f >>= \_ -> k) is that a file write in k, which would make no difference, would be omitted. In this sense, the monad laws are followed.

...

I must say that I prefer the automatic computation of dependencies as outlined by Joachim.

Thanks!

...

Of course, to create a "make" replacement, you'd also have to be able to call external programs and track which files they use, which is a hard problem, particularly as which files they use may depend on the contents of the files that they use. One could, however, lift calls to well-behaved external programs (e.g. those like ghc or gcc that can output their dependencies) into this sort of monad.

That’s easily possible with a custom sourceAction, which allows you to set the action, and the time stamp detection independently. Greetings, Joachim -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

6204

Age (days ago)

6211

Last active (days ago)

List overview

Download

19 comments

10 participants

participants (10)

Brandon S. Allbery KF8NH
ChrisK
Chung-chieh Shan
David Roundy
Derek Elkins
Henning Thielemann
Joachim Breitner
Ketil Malde
Luke Palmer
Ryan Ingram