
Bryan O'Sullivan wrote:
b) Making (FindClause a) a mere type synonym of (FileInfo -> a) has the benefit that the user can choose whether he wants to use monads or applicative functors via the respective instances or whether he does not.
That's where I had started out, as a matter of fact.
Yes. What I want to say is that you're still there, even if not intended. Although FindClause is implemented as a state monad, the functions evalClause :: FindClause a -> (FileInfo -> a) (`liftM` fileInfo) :: (FileInfo -> a) -> FindClause a are isomorphisms. In other words, nothing is gained or lost compared to (FileInfo -> a). Interestingly, these functions can be defined for any state monad, but in general, their composition is not the identity. Here it is because put is not available. Whether an abstract data type is isomorphic to some other type depends on what functions are defined but also on what functions are not defined.
or for general applicative functors as
(||) <$> ((== ".hs") <$> extension) <*> ((== ".lhs") <$> extension)
I don't find that very readable, I must confess.
Neither do I. It's probably due to (||) and (==) being infix operators, it's quite readable for normal functions. I think it's indeed best to have custom infix operators like (==?) and (||?). Btw, liftOp works for any functor liftOp :: Functor f => (a -> b -> c) -> f a -> b -> f c liftOp = flip . (fmap .) . flip
Maybe unsafePerformIO is the best solution, because you may safely close the file after having evaluated
predicate (unsafePerformIO $ hGetContents handle) (fileinfo)
to weak head normal form, i.e. True or False. I think it fits perfectly.
In principle
unsafeInterleaveIO $ readFile fileName
ought to be better, because it will not try to open the file unless the predicate actually inspects it, and opening files is expensive. But it will also not close the file until a finalizer kicks in. A tidier approach might be:
maybeH <- newIORef Nothing contents <- unsafeInterleaveIO $ do h <- openFile fileName ReadMode writeIORef maybeH (Just h) hGetContents h let result = predicate contents result `seq` readIORef maybeH >>= maybe (return ()) hClose
That's a little cumbersome, but very appealing from the perspective of a user of the library. Unfortunately, it looks to me like it's not very safe; see below.
Yes, this code looks best. And I think it's safe if the predicate may only return unary constructors like True or False. I mean, evaluating this to WHNF is like evaluating it to NF and every unused part of the file can safely be discarded, since there are no unevaluated thunks depending on the file contents anymore (and which are not garbage).
Using System.FilePath.Find.fold gives you both file status and file path but the ought-to-be equivalent approach of using foldl' on the list returned by find only gives the file path but no file status. So, the suggestion is to make find return a list of FileInfo
Let me pass an idea by you. There's a problem with augmenting FileInfo to potentially cause IO to occur as above (both with your original suggestion and my changes), no? We lose the ability to control when a file might be opened, and when it ought to be closed.
I agree, augmenting FileInfo with actual file contents is probably not advisable, although it would be very functional in style. Better ditch the idea for now. However, we probably could safely salvage two lazy reading operations: 1) reading the FileStatus record of a file 2) reading file contents if to be used in a True/False predicate The common pattern is that the information extracted is small enough to be deepSeq'ed, so that the file can be closed early once the lazy read is triggered.
If that were the case, the fold interface would have the same problem, if it decided for some reason to sock away FileInfo values in an accumulator and work on them after the fold had completed.
(The rank-2 trick can prevent FileInfos to leak out into the result of the fold fold :: (forall s . a -> FileInfo s -> a) -> FilePath -> IO a but that doesn't help very much as long as one would want a to be able to depend on the file contents via one of contents :: forall s . FileInfo s -> String withContents :: forall s . FileInfo s -> (String -> a) -> a so this is not an option.) Concerning fold, I think that it is the very core of the directory traversal algorithm: traversing a tree is a catamorphism. In other words, every other traversal can be formulated with a fold. For instance, we currently almost have find r p dir = fold r (flip $ (:) . infoPath) [] dir except that find returns a lazy list whereas fold traverses the directory tree before returning the list. But fold is currently not the general catamorphism, which is fold :: (FileInfo -> [a] -> a) -> (FileInfo -> a) -> FilePath -> IO a (I have been tricked by the name 'Foldable', this class does not give the catamorphism I had in mind.) This fold has to be lazy to be useful since only this allows to skip hole subdirectories. With the catamorphism, we can express find and the old fold: toList r p = fold branch leaf where branch dir xs = if r dir then concat xs else [] leaf file = if p file then [file] else [] find r p = map infoPath . toList r p oldfold r f x root = listSeq `fmap` foldl' f x `fmap` toList r (const True) root where listSeq xs = length xs `seq` xs (Btw, does the old fold only fold the value of type a over regualar files or also over directories?) Being able to return FileInfos captures reading operation 1) but misses the opportunity 2). However, 2) can simply be supplied as an extra function contentsSatisfies :: (String -> Bool) -> FilePath -> Bool contentsSatisfies f file = unsafePerformIO $ do h <- openFile file ReadMode b <- f `fmap` hGetContents h b `seq` hClose h return b to be used at the users discretion.
Of course, this leads to the question whether find should be factored even more into generate & prune
find r p dir = map filepath . filter p . directoryWalk r dir
with the intention that System.FilePath.Find only exports directoryWalk.
That's a nice idea, but subject to the same problems as the fold interface would be if we added file contents to the interface.
Your other observations regarding making a directory tree abstract, and instances of some of our favourite typeclasses, are very appealing.
As soon as we supply a (lazy) catamorphism like above, the user can make a concrete directory tree data File = Directory FileInfo [File] | File FileInfo (fold Directory File) :: FilePath -> IO File and the only way to keep the type abstract is to only supply weaker stuff than fold. Which is probably too weak then. Regards, apfelmus