
On Sat, 2006-07-15 at 22:14 +0400, Bulat Ziganshin wrote:
Hello Duncan,
Saturday, July 15, 2006, 8:04:26 PM, you wrote:
getContents, putStr, readfile, interact etc are encoding-independent, they're just the same as hGet/hPut, working on binary data blocks. Indeed putStr = hPut stdout.
one shortage i've seen in the library is that you don't see difference between Text and Binary modes of file open. indeed, on the Unix it's the same, but not on windows. below is my conversation on this topic with Donald. finally he applied changes i proposed (using of openFile instead of openBinaryFile in these operations) and today i sent to him patch that does the same change in Lazy module
the System.IO contains the following definitions:
readFile name = openFile name ReadMode >>= hGetContents
writeFile name str = do hdl <- openFile name WriteMode ... appendFile name str = do hdl <- openFile name AppendMode ...
As you can see, file is open in text mode, while your definitions open files in Binary mode:
readFile f = bracket (openBinaryFile f ReadMode) hClose (\h -> hFileSize h >>= hGet h . fromIntegral)
writeFile f ps = bracket (openBinaryFile f WriteMode) hClose (\h -> hPut h ps)
appendFile f txt = bracket (openBinaryFile f AppendMode) hClose (\hdl -> hPut hdl txt)
I don't understand your point here. Do you mean I should be opening in Text mode, since its not portable in Binary mode? Can you clarify?
just for case you don't know - due the history roots, different operation systems has different line end sequences - Unix use chr(10), classical Mac OS - chr(13), while DOS/Windows uses chr(13)+chr(10)
In order to allow writing universal text-processing programs that works with any OS, standard C libraries implemented ability to open files in "text mode", in which case OS-specific line ends translated by the library to standard Unix ones when reading, and vice versa when writing
System.IO routines i mentioned also opens files in text mode which means that they will correctly translate on Windows 13+10 line ends (standard for this OS) to the chr(10). This means that any text-processing functions written with translated (aka Unix) line ends in mind, will work correctly (with contents of files read/written by mentioned System.IO routines) even on Windows
for example, 2-line text file on Windows may contain something like "line1\r\nline2". When read by openBinaryFile and split by 'lines', the result will be ["line1\r", "line2"], that is incorrect. When read by openFile (which opens files in text mode), Windows-specific line end will be translated to Unix-specific one, so the string read will be "line1\nline2" and the 'lines' will return correct results ["line1", "line2"]
So, while under Unix there is absolutely no difference which mode you use to open files, this makes difference on Windows. If original routines uses openFile then these routines are intended to work with _text_ files and their clones should give a chance to text translation too.
So presumably the correct solution is to have the readFile, writeFile etc in the Data.ByteString module use openBinaryFile and the versions in Data.ByteString.Char8 use openFile. That way the versions that are interpreting strings as text will get the OS's line ending conversions. Duncan