Plan for file processing libs

Hello libraries, i think that ultimate structure for file-related libs should be the following, in the dependence order: * ByteString support * UnicodeByteString support * Stringable class that supports the common interface to String, UTF8ByteString, UTF16ByteString and so on * FilePath operations implemented via Stringable class * FileSystem operations (readdir, stat, copy, delete, mkdir...) implemented for Stringable arguments using FilePath module * FD operations which supports manipulation of file contents via unix-style file descriptors (read/write/seek) where 'open' operation should support any Stringable parameter * Streams-like layer which provides rich set of file manipulation operations in native Haskell way and use FD operations for its file support * NewBinary-like layer which provides binary I/O and serialization on top of Streams Neil, as one variant of placing FilePath module we may join filepath, filesystem and FD levels into the one library, say named Files. currently, we can include here: * FilePath module * copies of System.Directory, System.Directory.Internals, System.Posix.Internals modules which is now *unsupported* part of Base lib * System.FD and System.MMapFile modules from my Streams lib so, this lib will provide a rich set of file-related operations - from parsing filenames to low-level access to file contents, allowing to put more sophisticated I/O libs on top of it the reason to put high-level I/O lib in separate package is that there are plenty of different design choices possible (and exists). on the other hand, all operations provided by such Files library are low-level ones, there are no design decisions here -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, 2006-11-23 at 13:23 +0300, Bulat Ziganshin wrote:
Hello libraries,
i think that ultimate structure for file-related libs should be the following, in the dependence order:
* ByteString support * UnicodeByteString support * Stringable class that supports the common interface to String, UTF8ByteString, UTF16ByteString and so on * FilePath operations implemented via Stringable class * FileSystem operations (readdir, stat, copy, delete, mkdir...) implemented for Stringable arguments using FilePath module * FD operations which supports manipulation of file contents via unix-style file descriptors (read/write/seek) where 'open' operation should support any Stringable parameter * Streams-like layer which provides rich set of file manipulation operations in native Haskell way and use FD operations for its file support * NewBinary-like layer which provides binary I/O and serialization on top of Streams
If you're talking about standardising some kind of Binary module I'd much rather see one based on lazy byte strings since this is pure. There is no need to tie binary serialisation to IO, file handles, or your streams proposal. I know some people are working on such an implementation. I think it will be possibly to make a nice serialisation api that is both pure and high performance. Actually, similarly, I don't think there is any need to tie a pure data structure modules like Data.ByteString (or the unicode variant Data.PackedString) to an IO package. Duncan

Hello Duncan, Thursday, November 23, 2006, 4:51:39 PM, you wrote:
* NewBinary-like layer which provides binary I/O and serialization on top of Streams
Actually, similarly, I don't think there is any need to tie a pure data structure modules like Data.ByteString (or the unicode variant Data.PackedString) to an IO package.
these are layers of *libraries*, not one super-library. it reflects only functional dependencies of my own work and don't means that this is the only plan possible. vice versa, as long as there are alternative library designs, the functionality should be split into separate libs in order to provide for alternative libraries ability to reuse existing code just one example - as part of Streams library, i've developed System.FD and System.MMapFile modules. these modules can't be used in FPS, though, because my library by itself imports FPS in order to provide ByteString I/O. so it seems that these modules should be put into separate package, which may be imported both by FPS and Streams to serve their needs
If you're talking about standardising some kind of Binary module I'd much rather see one based on lazy byte strings since this is pure. There is no need to tie binary serialisation to IO, file handles, or your streams proposal.
I know some people are working on such an implementation. I think it will be possibly to make a nice serialisation api that is both pure and high performance.
no, i don't speak about any standards, just about my understanding of I/O library infrastructure which also includes Binary-alike as the last layer. independent of this layer, I/O library needs a file API what supports packed filenames for efficiency -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, Nov 23, 2006 at 01:23:34PM +0300, Bulat Ziganshin wrote:
Hello libraries,
i think that ultimate structure for file-related libs should be the following, in the dependence order:
* ByteString support * UnicodeByteString support * Stringable class that supports the common interface to String, UTF8ByteString, UTF16ByteString and so on * FilePath operations implemented via Stringable class
Except that for POSIX systems Stringable isn't the appropriate class for FilePath operations, but rather ByteString. If you're thinking to rewrite the IO libraries using a class to handle the FilePaths (which sounds like a wonderful idea), it would make sense to try to think of a portable way to do this. I'm not sure what a portable approach would look like, but I'm afraid that we'd need to make the datatype used abstract. Or at least we'd have to be clear in the distinction between either a ByteString or a String and a FilePath, because on POSIX systems a FilePath is a ByteString (or is equivalent to one), while on Windows a FilePath is a String (or equivalent to one). Not wanting to simply throw mud in the waters, but it's all too easy to completely ignore the non-windows half of the world, and for a proposed core set of libraries, that'd be a real shame.
* FileSystem operations (readdir, stat, copy, delete, mkdir...) implemented for Stringable arguments using FilePath module -- David Roundy Department of Physics Oregon State University

Hello David, Sunday, November 26, 2006, 5:44:55 PM, you wrote:
* FilePath operations implemented via Stringable class
Except that for POSIX systems Stringable isn't the appropriate class for FilePath operations, but rather ByteString.
oh, sorry, i considered ByteString here as a sort of string, but unix filename is just a byte sequence. nevertheless, in order to implement operations of FilePath module, we should rely on some *encoding*. if some unixes may use encoding other than utf8/latin1/other ascii-compatible we have a problem od recognizing this encoding. one cannot extract, for example, basename without knowing encoding of '.' and '/'
If you're thinking to rewrite the IO libraries using a class to handle the FilePaths (which sounds like a wonderful idea), it would make sense to try to think of a portable way to do this.
to be exact, IO library (like a System.IO module) has only a few filepath-related operations - openFile/show/hShow. the main devil is in FilePath manipulations and filesystem handling. yes, i already wrote some unicode-enabled code for windows in this area, and emerging some darcs-based library to promote further development of such code may be helpful. if there are peoples that can manage such repository and peoples what will work on unix-related problems, we can start such project -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Sunday, November 26, 2006, 5:44:55 PM, you wrote:
* FilePath operations implemented via Stringable class
Except that for POSIX systems Stringable isn't the appropriate class for FilePath operations, but rather ByteString.
oh, sorry, i considered ByteString here as a sort of string, but unix filename is just a byte sequence. nevertheless, in order to implement operations of FilePath module, we should rely on some *encoding*. if some unixes may use encoding other than utf8/latin1/other ascii-compatible we have a problem od recognizing this encoding. one cannot extract, for example, basename without knowing encoding of '.' and '/'
ASCII, of course. The problem is, in Unix any sequence of bytes is allowed as a file name, except '/' (reserved as directory separator) and 0 (zero, reserved as final terminator). Thus you can have filenames which are not valid in most encodings. You can use latin1 but this may be misleading as other filenames may be intended to be interpreted as utf8. It is all very bad. In principle, encoding can change from one directory to the next, or even from one file to the next in the same directory. You can play funny jokes with people you work with by giving them files that contain backspace characters and similar oddities... Ben
participants (4)
-
Benjamin Franksen
-
Bulat Ziganshin
-
David Roundy
-
Duncan Coutts