
Haskell is great at manipulating tree structures, but I can't seem to find anything representing a directory tree. A simple representation would be something like this: data Dir = Dir {dirName :: String, subDirectories :: [Dir], files :: [File]} data File = File {fileName :: String, fileSize :: Int} Maybe these would need to be parametrized to allow a function splitting files by extension or that kind of thing. Anyway, the whole idea would be to abstract as much of the file stuff out of the IO monad as possible. I haven't used the "Scrap Your Boilerplate" stuff yet, but it seems like that could fit in here naturally to traverse a Dir and make changes at specified points. The only problem I can see (so far) with the approach is that it might make "big changes" to the directory tree too easy to make. I'm not sure immediately how to deal with that, or if the answer is just to post a "be careful" disclaimer. So, what do you think? Do you know of any work in this direction? Is there a way to make "dangerous 1-liners" safe? Is there a fundamental flaw with the approach I'm missing? Thanks much, Chad

Hello, Have you seen Tom Moertel's series on directory-tree printing in Haskell ? http://blog.moertel.com/articles/2007/03/28/directory-tree-printing-in-haske... j.

Hi, to load an Haskell symbol at run-time is still necessary to use the load functions from the hs-plugins library (System.Plugins.Load) or is there some function in the GHC API that does the same job? Thanks, titto

tittoassini:
Hi,
to load an Haskell symbol at run-time is still necessary to use the load functions from the hs-plugins library (System.Plugins.Load) or is there some function in the GHC API that does the same job?
yes, definitely possible. i think Lemmih put an example on the wiki a while ago. basically, ghc-api exposes the lower level api also used by hs-plugins -- a nice project would be to provide the hs-plugins api directly in ghc-api - avoiding the need for an external hs-plugins package. -- Don

On Wednesday 27 June 2007 03:06:15 Donald Bruce Stewart wrote:
tittoassini:
Hi,
to load an Haskell symbol at run-time is still necessary to use the load functions from the hs-plugins library (System.Plugins.Load) or is there some function in the GHC API that does the same job?
yes, definitely possible. i think Lemmih put an example on the wiki a while ago. basically, ghc-api exposes the lower level api also used by hs-plugins -- a nice project would be to provide the hs-plugins api directly in ghc-api - avoiding the need for an external hs-plugins package.
-- Don
Hi Don, thanks for the answer. I checked again the http://haskell.org/haskellwiki/GHC/As_a_library page on the wiki and there are examples of interactive evaluation but I cannot find an example of loading a symbol from a compiled module. If anyone can provide one, I will be happy to test it and add it to the "GHC as a library" Wiki page. Regards, titto

tittoassini:
On Wednesday 27 June 2007 03:06:15 Donald Bruce Stewart wrote:
tittoassini:
Hi,
to load an Haskell symbol at run-time is still necessary to use the load functions from the hs-plugins library (System.Plugins.Load) or is there some function in the GHC API that does the same job?
yes, definitely possible. i think Lemmih put an example on the wiki a while ago. basically, ghc-api exposes the lower level api also used by hs-plugins -- a nice project would be to provide the hs-plugins api directly in ghc-api - avoiding the need for an external hs-plugins package.
-- Don
Hi Don, thanks for the answer.
I checked again the http://haskell.org/haskellwiki/GHC/As_a_library page on the wiki and there are examples of interactive evaluation but I cannot find an example of loading a symbol from a compiled module.
Yes, this is the part I meant about duplicating the hs-plugins API. Both ghc-apii and hs-plugins use the following api from the runtime: foreign import ccall unsafe "Linker.h lookupSymbol" pluginSym :: CString -> IO (Ptr a) foreign import ccall unsafe "Linker.h loadObj" pluginLoad :: CString -> IO Bool foreign import ccall unsafe "Linker.h initLinker" pluginInit :: IO () foreign import ccall unsafe "Linker.h resolveObjs" pluginResolve :: IO Bool hs-plugins duplicates some of the ghci internals to find, resolve and load packages and objects, and access their symbols. So that code must exist in the ghci parts of the ghc-api. You'll have to peek around in there. -- Don

On Fri, Jun 22, 2007 at 01:19:01PM -0700, Chad Scherrer wrote:
Haskell is great at manipulating tree structures, but I can't seem to find anything representing a directory tree. A simple representation would be something like this:
data Dir = Dir {dirName :: String, subDirectories :: [Dir], files :: [File]} data File = File {fileName :: String, fileSize :: Int}
Maybe these would need to be parametrized to allow a function splitting files by extension or that kind of thing. Anyway, the whole idea would be to abstract as much of the file stuff out of the IO monad as possible.
I haven't used the "Scrap Your Boilerplate" stuff yet, but it seems like that could fit in here naturally to traverse a Dir and make changes at specified points.
The only problem I can see (so far) with the approach is that it might make "big changes" to the directory tree too easy to make. I'm not sure immediately how to deal with that, or if the answer is just to post a "be careful" disclaimer.
So, what do you think? Do you know of any work in this direction? Is there a way to make "dangerous 1-liners" safe? Is there a fundamental flaw with the approach I'm missing?
Darcs does this sort of thing (see SlurpDirectory.lhs in the source code), which can be pretty nice, but you really only want to use it for read-only purposes. Early versions of darcs made modifications directly to Slurpies (these directory tree data structures), which kept track of what changes had been made, and then there was a "write modifications" IO functions that actually made the changes. But this was terribly fragile, since ordering of changes had to be kept track of, etc. The theory was nice, that we'd be able to make the actual changes all at once (only write once to each file, for example, each file had a dirty bit), but in practice we kept running into trouble. So now we just use this for read-only purposes, for which it works fine, although it still can be scary, if users modify the directories while we're looking at them (but this is scary regardless...). Nowadays we've got a (moderately) nice little monad DarcsIO, which allows us to do file/directory IO operations on either Slurpies or disk, or various other sorts of virtualized objects. Someday I'd like to write an industrial strength version of this monad (which addresses more than just darcs' needs). -- David Roundy Department of Physics Oregon State University

Chad,
I can't seem to find anything representing a directory tree
Here's my shot. http://hpaste.org/370 Not much different than Tom Moertel's, but grabs the fileSize along the way. -Greg

Nice, thanks! Certainly looks like a good start at this.
What got me thinking about this is I'd like to be able to do something
like this in just a couple lines of code:
gunzip -c ./2*/*.z
... and feed the result into a giant lazy ByteString. Now, the UNIX
command doesn't really cut it, because it complains there are too many
files, but its simplicity still makes the IO monad solution feel
clunky by comparison.
Chad
On 6/22/07, Greg Fitzgerald
Here's my shot. http://hpaste.org/370 Not much different than Tom Moertel's, but grabs the fileSize along the way. -Greg

Chad Scherrer wrote:
What got me thinking about this is I'd like to be able to do something like this in just a couple lines of code:
gunzip -c ./2*/*.z
... and feed the result into a giant lazy ByteString.
Using my FileManip library, you'd do that like this. import Codec.Compression.GZip import qualified Data.ByteString.Lazy as B import System.FilePath.Glob foo :: IO B.ByteString foo = namesMatching "*/*.gz" >>= fmap B.concat . mapM (fmap decompress . B.readFile) http://hackage.haskell.org/cgi-bin/hackage-scripts/package/FileManip-0.2

Thanks, Bryan, this is much cleaner than the imperative hack I was throwing together. And aside from the imports, it even fits the "couple lines of code" criteria! Wonderful. I won't be able to try this out until I get back to work, but I'm wondering whether this will handle a few thousand files. As it is, even the "gunzip -c ./2*/*.z" I'm trying to emulate doesn't really work as is, because the OS complains there are too many pattern matches. Does "namesMatching" just feed the pattern to the OS, or does it match the pattern itself? Chad
What got me thinking about this is I'd like to be able to do something like this in just a couple lines of code:
gunzip -c ./2*/*.z
... and feed the result into a giant lazy ByteString.
Using my FileManip library, you'd do that like this.
import Codec.Compression.GZip import qualified Data.ByteString.Lazy as B import System.FilePath.Glob
foo :: IO B.ByteString foo = namesMatching "*/*.gz" >>= fmap B.concat . mapM (fmap decompress . B.readFile)

Bryan,
I downloaded your FileManip library and Duncan's zlib library, but I
kept getting a "Too many open files" exception (it matches over 9000
files). I tried to get around this using unsafeInterleaveIO as Greg
had suggested, so now I have this:
foo = namesMatching "*/*.z" >>=
fmap B.concat . mapM (unsafeInterleaveIO . fmap decompress . B.readFile)
Now it doesn't complain about too many open files, but instead I get
this runtime error:
LPS *** Exception: user error (Codec.Compression.Zlib: incorrect header check)
I tried to get the same error on simpler code, and I've found this
gives the same error:
bar = fmap decompress $ L.readFile "myData.z"
It seemed to me the file might be corrupted, but I can do
gunzip -c "myData.gz"
at the command line and see the results just fine.
I also tried gzipping a different, smaller file, and I changed the
string in "bar" accordingly. No error in that case. So it seems to be
a problem with myData.z, but why would it gunzip from the command line
with no trouble in that case?
Thanks,
Chad
On 6/24/07, Bryan O'Sullivan
Using my FileManip library, you'd do that like this.
import Codec.Compression.GZip import qualified Data.ByteString.Lazy as B import System.FilePath.Glob
foo :: IO B.ByteString foo = namesMatching "*/*.gz" >>= fmap B.concat . mapM (fmap decompress . B.readFile)
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/FileManip-0.2

I also tried gzipping a different, smaller file, and I changed the string in "bar" accordingly. No error in that case. So it seems to be a problem with myData.z, but why would it gunzip from the command line with no trouble in that case?
Thanks, Chad
Because gunzip is smarter than your program in that he can decompress gzip format but Z format too (which is produced by the very old "compress" unix utility). -- Jedaï

Jedaï,
Are you sure you're not confusing .z with .Z?
http://kb.iu.edu/data/afcc.html
And is it possible that gzip is smarter somehow? Doesn't
Codec.Compression.GZip call the same C library used by gzip?
Chad
On 6/25/07, Chaddaï Fouché
Because gunzip is smarter than your program in that he can decompress gzip format but Z format too (which is produced by the very old "compress" unix utility).

On Mon, Jun 25, 2007 at 12:48:27PM -0700, Chad Scherrer wrote:
Jedaï,
Are you sure you're not confusing .z with .Z?
http://kb.iu.edu/data/afcc.html
And is it possible that gzip is smarter somehow? Doesn't Codec.Compression.GZip call the same C library used by gzip?
gzip: supports gzip, pack, compress zlib doesn't. http://www.zlib.net/zlib_faq.html#faq12 You ought to just use newpopen or similar (http://www.cse.unsw.edu.au/~dons/code/newpopen) Stefan

Bulat,
I don't think I can. (1) (de)compress is defined for lazy bytestrings,
and (2) my data comes to me compressed in order to fit it all on a
single DVD. So even if I could uncompress each file strictly, I
couldn't hold such a big strict bytestring in memory at once.
On 6/25/07, Bulat Ziganshin
bar = fmap decompress $ B.readFile "myData.gz"
Monday, June 25, 2007, 10:47:11 PM, you wrote: try it with non-lazy bytestrings: import qualified Data.ByteString as B

Chad Scherrer wrote:
Now it doesn't complain about too many open files, but instead I get this runtime error:
LPS *** Exception: user error (Codec.Compression.Zlib: incorrect header check)
Are you sure you really have gzip files? If you're on a Linux or similar box, what does "file myfile.z" report to you? It should say something like "gzip compressed data".

On 6/25/07, Bryan O'Sullivan
Are you sure you really have gzip files? If you're on a Linux or similar box, what does "file myfile.z" report to you? It should say something like "gzip compressed data".
Aarrgh, that's the problem - it does use compress. Is the distinction between .z and .Z not an established standard? I'm guessing there's not a Haskell interface for compress. I could just tell the OS to start a gzip process, but I need to be able to build it here on my Linux box, and run it on various MS machines. Seems like the best approach at this point might be to require everyone (only 3 people) to uncompress the data onto the hard drive first, then go from there. Thanks for all the help! -Chad

On Mon, Jun 25, 2007 at 02:13:05PM -0700, Chad Scherrer wrote:
On 6/25/07, Bryan O'Sullivan
wrote: Are you sure you really have gzip files? If you're on a Linux or similar box, what does "file myfile.z" report to you? It should say something like "gzip compressed data".
Aarrgh, that's the problem - it does use compress. Is the distinction between .z and .Z not an established standard? I'm guessing there's not a Haskell interface for compress.
Very standard. .z : always pack .Z : always compress .gz : always gzip gzip can handle all three, zlib only the last. (Are you *sure* your file is compress?)
I could just tell the OS to start a gzip process, but I need to be able to build it here on my Linux box, and run it on various MS machines. Seems like the best approach at this point might be to require everyone (only 3 people) to uncompress the data onto the hard drive first, then go from there.
Or could could reimplement compress in Haskell. The algorithm is shockingly simple, and there is a sample implementation (needs optimization and compress(1) header support, but the LZW engine is there) is already on the Wiki. Note that the patent expired in June '06, so you don't need to worry about that. http://haskell.org/haskellwiki/Toy_compression_implementations Stefan

On 6/25/07, Stefan O'Rear
.z : always pack .Z : always compress .gz : always gzip
gzip can handle all three, zlib only the last. (Are you *sure* your file is compress?)
This means it's compress, doesn't it? $ file myData.z myData.z: compress'd data 16 bits
I could just tell the OS to start a gzip process, but I need to be able to build it here on my Linux box, and run it on various MS machines. Seems like the best approach at this point might be to require everyone (only 3 people) to uncompress the data onto the hard drive first, then go from there.
Or could could reimplement compress in Haskell. The algorithm is shockingly simple, and there is a sample implementation (needs optimization and compress(1) header support, but the LZW engine is there) is already on the Wiki. Note that the patent expired in June '06, so you don't need to worry about that.
http://haskell.org/haskellwiki/Toy_compression_implementations
This looks like a lot of fun, but I've got too many other pieces of code to try to get running efficiently as it is. But I hadn't seen this link before, and it looks like interesting stuff. Thanks! Chad

On Mon, Jun 25, 2007 at 02:42:18PM -0700, Chad Scherrer wrote:
On 6/25/07, Stefan O'Rear
wrote: .z : always pack .Z : always compress .gz : always gzip
gzip can handle all three, zlib only the last. (Are you *sure* your file is compress?)
This means it's compress, doesn't it?
$ file myData.z myData.z: compress'd data 16 bits
Yep. (I wonder when the filename got munged? I suppose it doesn't matter.) Recompressing sounds good :) Stefan

On Jun 25, 2007, at 17:24 , Stefan O'Rear wrote:
.z : always pack .Z : always compress
...unless it's gone through a Windows system or a CD somewhere along the way. Note that gunzip accepts a wide variety of extensions but recognizes the files by magic number, *not* by the extension. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On Jun 25, 2007, at 14:47 , Chad Scherrer wrote:
LPS *** Exception: user error (Codec.Compression.Zlib: incorrect header check)
Keep in mind that GNU gunzip also handles the old "compress" (.Z) and System V "pack" (.z) formats; I'd expect the Zlib codec to only handle gzip format and not the others. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (12)
-
Alexis Hazell
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Bulat Ziganshin
-
Chad Scherrer
-
Chaddaï Fouché
-
David Roundy
-
dons@cse.unsw.edu.au
-
Greg Fitzgerald
-
Jeremy Shaw
-
Pasqualino 'Titto' Assini
-
Stefan O'Rear