
Hello, haskellers! Suppose we have function (it filters package filenames from apt Packages file):
getPackageList :: FilePath -> IO [FilePath] getPackageList packageFile = withFile packageFile ReadMode $ \h -> do c <- hGetContents h return $ map (drop 10) $ filter (startsWith "Filename:") $ lines c -- (1) where startsWith [] _ = True startsWith _ [] = False startsWith (x:xs) (y:ys) | x == y = startsWith xs ys | otherwise = False
When, I apply it to a Packages file I (surely) get an empty list. This is an expected result due to lazyness of hGetContents. I tried changing line (1) to
return $ map (drop 10) $ filter (startsWith "Filename:") $! lines c
or
return $ map (drop 10) $! filter (startsWith "Filename:") $! lines c
with no success. Chaning it to
return $! map (drop 10) $ filter (startsWith "Filename:") $ lines c
makes getPackageList function return several (but not all) filenames. What I'm missing? And how can I fix my code? -- WBR, Max Vasin.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Max Vasin wrote:
Hello, haskellers!
Suppose we have function (it filters package filenames from apt Packages file):
getPackageList :: FilePath -> IO [FilePath] getPackageList packageFile = withFile packageFile ReadMode $ \h -> do c <- hGetContents h return $ map (drop 10) $ filter (startsWith "Filename:") $ lines c -- (1) where startsWith [] _ = True startsWith _ [] = False startsWith (x:xs) (y:ys) | x == y = startsWith xs ys | otherwise = False
When, I apply it to a Packages file I (surely) get an empty list. This is an expected result due to lazyness of hGetContents.
Combined with the fact that you're not evaluating its non-strict result until after the file handle has been closed, yes. Your current set of IO actions is probably similar to: . open file . process file . close file . use results from processing the file. where the first three steps are all handled by your getPackageList. To avoid either getting incomplete (or empty) results, or having to strictify everything with $!, it'd be better for you to use a process more like: . open file . process file . use results from processing the file. . close file probably by moving the withFile outside of getPackageList, to wrap a function that prints the results after they've been obtained. The function passed to withFile should generally include all the processing related to the file and its results, I believe.
I tried changing line (1) to
return $ map (drop 10) $ filter (startsWith "Filename:") $! lines c
The $! forces strictness, but since it's deep in the result, it isn't evaluated until it's too late.
Chaning it to
return $! map (drop 10) $ filter (startsWith "Filename:") $ lines c
makes getPackageList function return several (but not all) filenames.
I think we'd need to see the actual input and expected output, to understand what's going wrong here. It worked fine for me, for small tests. By the way, it's good policy to always post complete, runnable examples. Requiring anyone who wants to help you to write additional code just to get it to run decreases the chances that someone will bother to do so. (Disclaimer: I'm quite new to Haskell myself, so take what I say with more than a pinch of salt.) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI1/AT7M8hyUobTrERAo5qAJ9PKLbQv09UGffmxy6/eRuGS1eYbQCgj3gH 8zvxMrGk5pvCMCOQ6LVz0Yo= =GAvg -----END PGP SIGNATURE-----

Micah Cowan
Max Vasin wrote:
Hello, haskellers!
Suppose we have function (it filters package filenames from apt Packages file):
getPackageList :: FilePath -> IO [FilePath] getPackageList packageFile = withFile packageFile ReadMode $ \h -> do c <- hGetContents h return $ map (drop 10) $ filter (startsWith "Filename:") $ lines c -- (1) where startsWith [] _ = True startsWith _ [] = False startsWith (x:xs) (y:ys) | x == y = startsWith xs ys | otherwise = False
When, I apply it to a Packages file I (surely) get an empty list. This is an expected result due to lazyness of hGetContents.
Combined with the fact that you're not evaluating its non-strict result until after the file handle has been closed, yes.
Your current set of IO actions is probably similar to: . open file . process file . close file . use results from processing the file. where the first three steps are all handled by your getPackageList. To avoid either getting incomplete (or empty) results, or having to strictify everything with $!, it'd be better for you to use a process more like: . open file . process file . use results from processing the file. . close file probably by moving the withFile outside of getPackageList, to wrap a function that prints the results after they've been obtained. The function passed to withFile should generally include all the processing related to the file and its results, I believe. Yes. Probably I should leave closing file to the GC and use readFile, this seems to be the simplest way.
I tried changing line (1) to
return $ map (drop 10) $ filter (startsWith "Filename:") $! lines c
The $! forces strictness, but since it's deep in the result, it isn't evaluated until it's too late.
Chaning it to
return $! map (drop 10) $ filter (startsWith "Filename:") $ lines c
makes getPackageList function return several (but not all) filenames.
I think we'd need to see the actual input and expected output, to understand what's going wrong here. It worked fine for me, for small tests. The gzipped example file is here: ftp://ftp.debian.org/debian/dists/lenny/contrib/binary-i386/Packages.gz
By the way, it's good policy to always post complete, runnable examples. Requiring anyone who wants to help you to write additional code just to get it to run decreases the chances that someone will bother to do so.
Sorry. I've just omitted module imports:
import Control.Monad (filterM, mapM) import System.IO (withFile, IOMode (ReadMode), hGetContents) import qualified System.Posix.Files as SPF (isDirectory, getFileStatus)
Running in GHCi: GHCi, version 6.8.2: http://www.haskell.org/ghc/ :? for help Loading package base ... linking ... done. Prelude> :load Foo.hs [1 of 1] Compiling Foo ( Foo.hs, interpreted ) Ok, modules loaded: Foo. *Foo> getPackageList "Packages" >>= mapM_ putStrLn Loading package old-locale-1.0.0.0 ... linking ... done. Loading package old-time-1.0.0.0 ... linking ... done. Loading package filepath-1.1.0.0 ... linking ... done. Loading package directory-1.0.0.0 ... linking ... done. Loading package unix-2.3.0.0 ... linking ... done. pool/contrib/a/acx100/acx100-source_20070101-3_all.deb pool/contrib/a/alien-arena/alien-arena_7.0-1_i386.deb pool/contrib/a/alien-arena/alien-arena-browser_7.0-1_all.deb pool/contrib/a/alien-arena/alien-arena-server_7.0-1_i386.deb pool/contrib/a/alsa-tools/alsa-firmware-loaders_1.0.16-2_i386.deb pool/contrib/a/amoeba/amoeba_1.1-19_i386.deb pool/contrib/a/apple2/apple2_0.7.4-5_i386 *Foo> Printed list of package files is incomplete. -- WBR, Max Vasin.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Max Vasin wrote:
Micah Cowan
writes: I think we'd need to see the actual input and expected output, to understand what's going wrong here. It worked fine for me, for small tests. The gzipped example file is here: ftp://ftp.debian.org/debian/dists/lenny/contrib/binary-i386/Packages.gz
By the way, it's good policy to always post complete, runnable examples. Requiring anyone who wants to help you to write additional code just to get it to run decreases the chances that someone will bother to do so. Sorry. I've just omitted module imports:
import Control.Monad (filterM, mapM) import System.IO (withFile, IOMode (ReadMode), hGetContents) import qualified System.Posix.Files as SPF (isDirectory, getFileStatus)
Actually, I meant more the missing "main" function; but the example invocation you gave will do fine. As previously mentioned, the full list will be output if you ensure that the putStrLn calls take place within withFile, rather than without: getPackageList :: FilePath -> IO () getPackageList packageFile = withFile packageFile ReadMode $ \h -> do c <- hGetContents h mapM_ (putStrLn . drop 10) $ filter (startsWith "Filename:") $ lines c where startsWith [] _ = True startsWith _ [] = False startsWith (x:xs) (y:ys) | x == y = startsWith xs ys | otherwise = False Your experiments with $! don't work, I believe because seq only does a "shallow" evaluation: in particular, it doesn't evaluate successive elements in a list (RWH puts it: "seq stops as soon as it reaches a constructor" ($! is defined in terms of seq)). In order to work around this, you'd need to define your own versions of map and filter to manage the construction using seq. Not worth it, IMO, especially since accomplishing this forces the whole list to be held in memory, which isn't as nice as processing things sequentially before the file has been closed. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI2R9z7M8hyUobTrERAggSAJwIVUAEyeoeIRbNaljHIycTrub4UQCeM1j7 ARGXhGNl//o3SlipDuGhUyw= =KVlj -----END PGP SIGNATURE-----
participants (2)
-
Max Vasin
-
Micah Cowan