
On 7/1/08, Joachim Breitner
Hi,
thanks for your comments.
Am Montag, den 30.06.2008, 16:54 -0700 schrieb Ryan Ingram:
1) unsafeInterleaveIO seems like a big hammer to use for this problem, and there are a lot of gotchas involved that you may not have fully thought out. But you do meet the main criteria (file being read is assumed to be constant for a single run of the program).
Any other gotcha? Anyways, is this really worse than the similary lazy readFile? Using that would not safe the call to open, but at least the reading and processing, in the same situations.
Well, you're also (from your description) probably writing some tracking information to an IORef of some sort. That can happen in the middle of an otherwise pure computation, and it's difficult to know exactly when it'll get triggered, due to laziness. You can probably make it work :)
If you have the ability to store metadata about the computation along with the computation results, maybe that would be a better solution?
Not sure what you mean here, sorry. Can you elaborate?
Well, while doing the computation the first time, you can track what depends on what. Then you save *that* information out. Here's an example: main = runODIO $ do do bar <- readFileOD "bar.txt" baz <- readFileOD "baz.txt" let result = expensiveComputation bar baz writeFileOD "foo.bin" result do hat <- readFileOD "hat.txt" let result = otherComputation hat writeFileOD "foo2.bin" result Now, as you mentioned before, you know that the RHS of >> doesn't depend on the files read on the LHS. So the two "do" blocks here are independent. Now, if you run with no information, you run the whole computation, and you write out in your metadata "First we are going to build foo.bin from bar.txt and baz.txt, and then we build foo2.bin from hat.txt". Now when you get to the first "do" block, you know what computation is about to happen (since you've recorded it before), and can check the timestamps of foo.bin, bar.txt, and baz.txt, and potentially skip the whole thing. Of course now the metadata depends on the script itself, but you already had to deal with that problem :)
2) I agree with Luke that this "smells" more like an applicative functor. But getting to monad syntax is quite nice if you can do so. As an applicative functor you would have "writeFileOD :: Filename -> ODIO ByteString -> ODIO ()"; then writeFile can handle all the necessary figuring out of timestamps itself, and you get the bonus guarantee that the contents of the files read by the "ODIO ByteString" argument won't affect the filename you are going to output to.
I thought about this (without having the applicative abstraction in mind). This would then look like:
main = do f1 <- readFileOD "infile1" f2 <- readFileOD "infile2" writeFileOD "outfile1" $ someFunc <$> f1 <*> f2 writeFileOD "outfile2" $ someOtherFunc <$> f1
right?
Not exactly. Try this: writeFileOD "outfile1" (someFunc <$> readFileOD "infile1" <*> readFileOD "infile2") writeFileOD "outfile2" (someOtherFunc <$> readFIleOD "infile1") (or, equivalently, replace the "<-" with "let .. in" in your data).
Will it still work so that if both outfiles need to be generated, f1 is read only once?
That depends how you write it! Remember that you can write your applicative functor to just build up a graph of what computation might need to be done. You can then analyze that graph and look for sharing if necessary. If you want the sharing to be explicit, you need something a bit more monad-ish. If the type of "readFileOD" is "Filename -> ODIO (ODIO ByteString)" then your original syntax works and gives you a chance to pick up on the explicit sharing by labelling the result of "f1 <- ...". -- ryan