lazy IO in readFile

I'm trying to suss out the best (heh, maybe most idiomatic?) way to handle laziness in a particular file operation. I'm reading a file which contains a list of rss feed items that have been seen previously. I use this list to eliminate feed items I've seen before and thus filter a list of new items. (it's a program to email me feed items from a couple of low frequency feeds). So, the way I do this is to open the history file with readFile, read it into a list and then use that as a parameter to a filter function. Instead of getting confusing, here is some simple code that gets at the nut of the problem: import Control.Monad isNewItem :: [String] -> String -> Bool isNewItem [] = \_ -> True isNewItem ts = \x -> not (any (== x) ts) filterItems :: [String] -> [String] -> [String] filterItems old is = filter (isNewItem old) is getOldData :: IO [String] getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return []) main = do let testData = ["a", "b", "c", "d"] :: [String] currItems <- getOldData let newItems = filterItems currItems $ testData print newItems -- this is important, it mimics another IO action I'm -- doing in the real code... appendFile "testfile" . unlines $ newItems Please ignore, for the moment, whatever *other* problems (idiomatic or just idiotic) that may exist above and focus on the IO problem. This code works fine *if* the file "testfile" has in it some subset of the testData list. If it has the complete set, it fails with a "resource busy" exception. Okay, I can more or less understand what's going on here. Each letter in the testData list gets compared to the contents of the file, but because they are *all* found, the readFile call never has to try and fail to read the last line of the file. Thus the file handle is kept open lazily waiting around not having reached EOF. Fair enough. But what is the best solution? One obvious one, and the one I'm using now, is to move the appendFile call into a function with guards to prevent trying to append an empty list to the end of the file. This solves the problem not by forcing the read on EOF, but by not bothering to open the file for appending: writeHistory [] = return () writeHistory ni = appendFile "testfile" . unlines $ ni And this makes some sense. It's silly to try to write nothing to a file. But it also rubs me the wrong way. It's not solving the problem directly -- closing that file handle. So there's my question, how can I close that thing? Is there some way to force it? Do I need to rework the reading to read one line ahead of whatever I'm testing against (thereby forcing the read of EOF and closing the file)? thanks A --

Andrew, In Haskell, lazy I/O is a form of cheating, because Haskell functions are supposed to have no side effects, and lazy I/O is a side effect. At first, cheating seems attractive, but it takes a bit of experience to really understand why cheating really is not a good idea, and that Haskell is so powerful that it doesn't matter that you shouldn't cheat. That has certainly been my experience, and I had to find out the hard way. It sounds like you're starting to see some of the problems with cheating. Here's someone's philosophizing on the subject: http://lukepalmer.wordpress.com/2009/06/04/it-is-never-safe-to-cheat/ So the short answer is, no - there is no way to force the file returned by readFile to close. I'd recommend using withFile and hGetLine, like this: withFile "testfile" ReadMode $ \h -> do ... l <- hGetLine h If you want more speed, take a look at the stuff in Data.ByteString. If you want proper text encoding and speed, take a look at the 'text' package. Steve On 08/05/10 14:47, Andrew Sackville-West wrote:
I'm trying to suss out the best (heh, maybe most idiomatic?) way to handle laziness in a particular file operation. I'm reading a file which contains a list of rss feed items that have been seen previously. I use this list to eliminate feed items I've seen before and thus filter a list of new items. (it's a program to email me feed items from a couple of low frequency feeds).
So, the way I do this is to open the history file with readFile, read it into a list and then use that as a parameter to a filter function. Instead of getting confusing, here is some simple code that gets at the nut of the problem:
import Control.Monad
isNewItem :: [String] -> String -> Bool isNewItem [] = \_ -> True isNewItem ts = \x -> not (any (== x) ts)
filterItems :: [String] -> [String] -> [String] filterItems old is = filter (isNewItem old) is
getOldData :: IO [String] getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
main = do let testData = ["a", "b", "c", "d"] :: [String] currItems <- getOldData let newItems = filterItems currItems $ testData
print newItems -- this is important, it mimics another IO action I'm -- doing in the real code...
appendFile "testfile" . unlines $ newItems
Please ignore, for the moment, whatever *other* problems (idiomatic or just idiotic) that may exist above and focus on the IO problem.
This code works fine *if* the file "testfile" has in it some subset of the testData list. If it has the complete set, it fails with a "resource busy" exception.
Okay, I can more or less understand what's going on here. Each letter in the testData list gets compared to the contents of the file, but because they are *all* found, the readFile call never has to try and fail to read the last line of the file. Thus the file handle is kept open lazily waiting around not having reached EOF. Fair enough.
But what is the best solution? One obvious one, and the one I'm using now, is to move the appendFile call into a function with guards to prevent trying to append an empty list to the end of the file. This solves the problem not by forcing the read on EOF, but by not bothering to open the file for appending:
writeHistory [] = return () writeHistory ni = appendFile "testfile" . unlines $ ni
And this makes some sense. It's silly to try to write nothing to a file.
But it also rubs me the wrong way. It's not solving the problem directly -- closing that file handle. So there's my question, how can I close that thing? Is there some way to force it? Do I need to rework the reading to read one line ahead of whatever I'm testing against (thereby forcing the read of EOF and closing the file)?
thanks
A
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Sat, May 08, 2010 at 03:41:43PM +1200, Stephen Blackheath [to Haskell-Beginners] wrote:
Andrew,
In Haskell, lazy I/O is a form of cheating, because Haskell functions are supposed to have no side effects, and lazy I/O is a side effect. At first, cheating seems attractive, but it takes a bit of experience to really understand why cheating really is not a good idea, and that Haskell is so powerful that it doesn't matter that you shouldn't cheat.
So, are you saying that using something like readFile is cheating? Or just that lazy IO itself is cheating?
That has certainly been my experience, and I had to find out the hard way. It sounds like you're starting to see some of the problems with cheating.
Indeed. This whole exercise (of which the below is just a piece) has been enlightening. I'm reminded of the cat who's stuck in the IO monad. I've certainly gotten better at moving into and out of IO (or moving functions around into and out of IO).
Here's someone's philosophizing on the subject:
http://lukepalmer.wordpress.com/2009/06/04/it-is-never-safe-to-cheat/
cool thanks
So the short answer is, no - there is no way to force the file returned by readFile to close.
I figured as much. I'm not completely unhappy with my solution since it irks me to write out an empty list anyway. And it's really a simple little project for my personal use...
I'd recommend using withFile and hGetLine, like this:
withFile "testfile" ReadMode $ \h -> do ... l <- hGetLine h
and using this to read through the entire file and then closing it? (Don't answer that, I'll do the reading). hmm... a little thought suggests that laziness will still get me unless I put some strictness in somewhere. I'm still left with a case where the history list is never completely evaluated forcing the reading of EOF. I will apply some thought to it and see what happens. thanks A
If you want more speed, take a look at the stuff in Data.ByteString. If you want proper text encoding and speed, take a look at the 'text' package.
Steve
On 08/05/10 14:47, Andrew Sackville-West wrote:
I'm trying to suss out the best (heh, maybe most idiomatic?) way to handle laziness in a particular file operation. I'm reading a file which contains a list of rss feed items that have been seen previously. I use this list to eliminate feed items I've seen before and thus filter a list of new items. (it's a program to email me feed items from a couple of low frequency feeds).
So, the way I do this is to open the history file with readFile, read it into a list and then use that as a parameter to a filter function. Instead of getting confusing, here is some simple code that gets at the nut of the problem:
import Control.Monad
isNewItem :: [String] -> String -> Bool isNewItem [] = \_ -> True isNewItem ts = \x -> not (any (== x) ts)
filterItems :: [String] -> [String] -> [String] filterItems old is = filter (isNewItem old) is
getOldData :: IO [String] getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
main = do let testData = ["a", "b", "c", "d"] :: [String] currItems <- getOldData let newItems = filterItems currItems $ testData
print newItems -- this is important, it mimics another IO action I'm -- doing in the real code...
appendFile "testfile" . unlines $ newItems
Please ignore, for the moment, whatever *other* problems (idiomatic or just idiotic) that may exist above and focus on the IO problem.
This code works fine *if* the file "testfile" has in it some subset of the testData list. If it has the complete set, it fails with a "resource busy" exception.
Okay, I can more or less understand what's going on here. Each letter in the testData list gets compared to the contents of the file, but because they are *all* found, the readFile call never has to try and fail to read the last line of the file. Thus the file handle is kept open lazily waiting around not having reached EOF. Fair enough.
But what is the best solution? One obvious one, and the one I'm using now, is to move the appendFile call into a function with guards to prevent trying to append an empty list to the end of the file. This solves the problem not by forcing the read on EOF, but by not bothering to open the file for appending:
writeHistory [] = return () writeHistory ni = appendFile "testfile" . unlines $ ni
And this makes some sense. It's silly to try to write nothing to a file.
But it also rubs me the wrong way. It's not solving the problem directly -- closing that file handle. So there's my question, how can I close that thing? Is there some way to force it? Do I need to rework the reading to read one line ahead of whatever I'm testing against (thereby forcing the read of EOF and closing the file)?
thanks
A
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
--

Andrew, On 08/05/10 15:59, Andrew Sackville-West wrote:
On Sat, May 08, 2010 at 03:41:43PM +1200, Stephen Blackheath [to Haskell-Beginners] wrote:
Andrew,
In Haskell, lazy I/O is a form of cheating, because Haskell functions are supposed to have no side effects, and lazy I/O is a side effect. At first, cheating seems attractive, but it takes a bit of experience to really understand why cheating really is not a good idea, and that Haskell is so powerful that it doesn't matter that you shouldn't cheat.
So, are you saying that using something like readFile is cheating? Or just that lazy IO itself is cheating?
I'm saying that 'readFile' is lazy I/O, lazy I/O is cheating, and cheating is bad. I want you to understand what I mean by "bad" rather than taking it on authority. The thing is, Haskell is all about giving you safety. Haskell gives you certain advantages that come only from going all the way, saying that pure really means pure. Cheating means that you can't make this assumption, and the rot can spread more quickly than you expect. This is fine for small programs, but can be a road to ruin in large programs. So, why cheat when Haskell makes it so easy for you not to, and rewards you so handsomely for your good behaviour? You may ask, if that's true, then why is 'readFile' in the standard libraries? My own opinion on this subject is that even though the designers of Haskell had incredible foresight, they didn't have our luxury of experience programming in the language they were designing. Lazy I/O is _very_ convenient, after all. All this is my opinion, but I honestly believe that the great majority of hardcore Haskellers agree with me that cheating is bad.
So the short answer is, no - there is no way to force the file returned by readFile to close.
I figured as much. I'm not completely unhappy with my solution since it irks me to write out an empty list anyway. And it's really a simple little project for my personal use...
I will sometimes use lazy I/O if the program is simple enough, but it seems like you're already getting into a situation where it's causing trouble. It's a difficult one, because, like I said, lazy I/O really is useful. My own approach is this: I ask myself, "Is it *really* referentially transparent?" If you can say "yes" to this question, then it's logically equivalent to not cheating (but you must still feel guilty). In the case of 'readFile', this becomes, "Can I assume that for the life of the program, the file's contents will never change?" (Haskell abides in a Zen-like 'eternal now'.) For a program that reads a config file once on startup, the answer might be "yes", and cheating may not introduce any risks. Perhaps I can give you some more insight by saying this: Laziness adds complexity to the reasoning necessary to understand how your program will execute. Purity means that this complexity is neutralized and becomes the compiler's problem, not yours. You have to remember that while the tiniest little piece of information calculated from the contents of your 'readFile' remains unevaluated, the file will not close. That's very difficult to reason about. Another thing to consider is, if the code you're implementing relies on lazy I/O, might you want to use it in a big program? If so, surely it would be better to do it in a more general way to begin with. One of the things monads are especially good for is replacing lazy I/O. You might object, "Lazy I/O is so incredibly brilliant, but you are telling me I can't use it! That really ruins Haskell for me if one of its most amazing features is not allowed!" I know this seems very unfair. But my reply is, "Purity really is *that* good. It's even worth giving up lazy I/O for - that's how good it is."
I'd recommend using withFile and hGetLine, like this:
withFile "testfile" ReadMode $ \h -> do ... l <- hGetLine h
and using this to read through the entire file and then closing it? (Don't answer that, I'll do the reading). hmm... a little thought suggests that laziness will still get me unless I put some strictness in somewhere. I'm still left with a case where the history list is never completely evaluated forcing the reading of EOF. I will apply some thought to it and see what happens.
I can't resist a couple of comments: Note that withFile closes the handle for you explicitly. It's completely safe in that respect (unless you pass 'h' as a return value from withFile - which is obviously a bad idea - but if you want the type system to make this impossible, this *can* be achieved!). You know exactly when it's being closed. With withFile/hGetLine, laziness can't get you, except in the usual way (that is, as it relates to memory and CPU usage). A parting thought: One of the great things about Haskell (compared with imperative programming) is that there are dozens of things you don't have to reason about any more, so you can concentrate on solving your problem. Do you see what I'm saying? Why even bother reasoning about whether laziness can get you? Just make everything pure and you don't have to. * (* I don't want to mislead you and make you think Haskell is something it's not. Therefore I need to add here that you *do* need to reason about the *space and CPU usage* of your code in the presence of laziness. IMO this is the only serious cost of using Haskell - the rest is benefit.) Steve
On 08/05/10 14:47, Andrew Sackville-West wrote:
I'm trying to suss out the best (heh, maybe most idiomatic?) way to handle laziness in a particular file operation. I'm reading a file which contains a list of rss feed items that have been seen previously. I use this list to eliminate feed items I've seen before and thus filter a list of new items. (it's a program to email me feed items from a couple of low frequency feeds).
So, the way I do this is to open the history file with readFile, read it into a list and then use that as a parameter to a filter function. Instead of getting confusing, here is some simple code that gets at the nut of the problem:
import Control.Monad
isNewItem :: [String] -> String -> Bool isNewItem [] = \_ -> True isNewItem ts = \x -> not (any (== x) ts)
filterItems :: [String] -> [String] -> [String] filterItems old is = filter (isNewItem old) is
getOldData :: IO [String] getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
main = do let testData = ["a", "b", "c", "d"] :: [String] currItems <- getOldData let newItems = filterItems currItems $ testData
print newItems -- this is important, it mimics another IO action I'm -- doing in the real code...
appendFile "testfile" . unlines $ newItems
Please ignore, for the moment, whatever *other* problems (idiomatic or just idiotic) that may exist above and focus on the IO problem.
This code works fine *if* the file "testfile" has in it some subset of the testData list. If it has the complete set, it fails with a "resource busy" exception.
Okay, I can more or less understand what's going on here. Each letter in the testData list gets compared to the contents of the file, but because they are *all* found, the readFile call never has to try and fail to read the last line of the file. Thus the file handle is kept open lazily waiting around not having reached EOF. Fair enough.
But what is the best solution? One obvious one, and the one I'm using now, is to move the appendFile call into a function with guards to prevent trying to append an empty list to the end of the file. This solves the problem not by forcing the read on EOF, but by not bothering to open the file for appending:
writeHistory [] = return () writeHistory ni = appendFile "testfile" . unlines $ ni
And this makes some sense. It's silly to try to write nothing to a file.
But it also rubs me the wrong way. It's not solving the problem directly -- closing that file handle. So there's my question, how can I close that thing? Is there some way to force it? Do I need to rework the reading to read one line ahead of whatever I'm testing against (thereby forcing the read of EOF and closing the file)?
thanks
A

sorry, I was distracted for a few days. On Sat, May 08, 2010 at 11:38:25PM +1200, Stephen Blackheath [to Haskell-Beginners] wrote [...]
In the case of 'readFile', this becomes, "Can I assume that for the life of the program, the file's contents will never change?" (Haskell abides in a Zen-like 'eternal now'.) For a program that reads a config file once on startup, the answer might be "yes", and cheating may not introduce any risks.
In fact, I know the file will change over the life of the program because I'm appending to it. Though in theory, I only need to read it all in once and then after some filtering and other operations, append to it.
Perhaps I can give you some more insight by saying this: Laziness adds complexity to the reasoning necessary to understand how your program will execute.
yes. I'm with you there.
Purity means that this complexity is neutralized and becomes the compiler's problem, not yours. You have to remember that while the tiniest little piece of information calculated from the contents of your 'readFile' remains unevaluated, the file will not close. That's very difficult to reason about.
indeed.
Another thing to consider is, if the code you're implementing relies on lazy I/O, might you want to use it in a big program? If so, surely it would be better to do it in a more general way to begin with. One of the things monads are especially good for is replacing lazy I/O.
You might object, "Lazy I/O is so incredibly brilliant, but you are telling me I can't use it! That really ruins Haskell for me if one of its most amazing features is not allowed!"
I know this seems very unfair. But my reply is, "Purity really is *that* good. It's even worth giving up lazy I/O for - that's how good it is."
I really agree with you, to the extent my limited knowledge and understanding allow, and so am willing to try to learn this...
I'd recommend using withFile and hGetLine, like this:
withFile "testfile" ReadMode $ \h -> do ... l <- hGetLine h
I'm having trouble determining how to put this into the existing context of a string of filter's and maps where the contents of the file are used in a predicate to a filter. (if you really want you can look at my ridiculous code at http://git.swclan.homelinux.org/rss2email.git) I suppose that means I need to rethink the structure of my program (which already seems to be necessary anyway...). [...]
A parting thought: One of the great things about Haskell (compared with imperative programming) is that there are dozens of things you don't have to reason about any more, so you can concentrate on solving your problem. Do you see what I'm saying? Why even bother reasoning about whether laziness can get you? Just make everything pure and you don't have to. *
(* I don't want to mislead you and make you think Haskell is something it's not. Therefore I need to add here that you *do* need to reason about the *space and CPU usage* of your code in the presence of laziness. IMO this is the only serious cost of using Haskell - the rest is benefit.)
thanks for your thoughtful comments. A

Andrew, On 15/05/10 11:57, Andrew Sackville-West wrote:
I'm having trouble determining how to put this into the existing context of a string of filter's and maps where the contents of the file are used in a predicate to a filter. (if you really want you can look at my ridiculous code at http://git.swclan.homelinux.org/rss2email.git)
I took a look. You've got a list of items and you want to check each one against your 'seen it' file. I'm not sure what your requirements are but currently the whole file gets read into memory. So, sticking with that, here's _a_ way to do it (with a Set, which gives a faster lookup): import Control.Exception import Data.Set (Set) import qualified Data.Set as S import System.IO.Error import Prelude hiding (catch) -- | Return "seen it" predicate readHistory :: FilePath -> IO (String -> Bool) readHistory fn = do hist <- withFile fn ReadMode $ \h -> fetchLines h S.empty return (`S.member` hist) where fetchLines h hist = do l <- hGetLine h fetchLines h $! S.insert l hist `catch` \exc -> if isEOFError exc then return hist else throwIO exc This is completely strict. The $! is there to make sure we're keeping a set in memory, not a chain of inserts (though the inserts wouldn't actually take up any more memory than the set does). I haven't tried compiling this. Steve

On Sun, May 16, 2010 at 11:03:03PM +1200, Stephen Blackheath [to Haskell-Beginners] wrote:
Andrew,
On 15/05/10 11:57, Andrew Sackville-West wrote:
I'm having trouble determining how to put this into the existing context of a string of filter's and maps where the contents of the file are used in a predicate to a filter. (if you really want you can look at my ridiculous code at http://git.swclan.homelinux.org/rss2email.git)
I took a look. You've got a list of items and you want to check each one against your 'seen it' file. I'm not sure what your requirements are but currently the whole file gets read into memory. So, sticking with that, here's _a_ way to do it (with a Set, which gives a faster lookup):
yeah, reading it all in is fine. NOt sure in the long term what the problems are with that. I suppose if it was a really big history file, it would be important to do something else, but it works for now.
import Control.Exception import Data.Set (Set) import qualified Data.Set as S import System.IO.Error import Prelude hiding (catch)
-- | Return "seen it" predicate readHistory :: FilePath -> IO (String -> Bool) readHistory fn = do hist <- withFile fn ReadMode $ \h -> fetchLines h S.empty return (`S.member` hist) where fetchLines h hist = do l <- hGetLine h fetchLines h $! S.insert l hist `catch` \exc -> if isEOFError exc then return hist else throwIO exc
This is completely strict. The $! is there to make sure we're keeping a set in memory, not a chain of inserts (though the inserts wouldn't actually take up any more memory than the set does). I haven't tried compiling this.
thanks for this. it helps a lot. hmmm... I wonder why it is I never have a problem returning functions in Scheme, but it never occurs to me as I learn Haskell? thanks for your help. A

On Thu, May 20, 2010 at 3:17 AM, Andrew Sackville-West
thanks for this. it helps a lot. hmmm... I wonder why it is I never have a problem returning functions in Scheme, but it never occurs to me as I learn Haskell?
getIsNewItemPredicate :: stuff -> (String -> Bool) but in normal Haskell, you wouldn't write this last pair of
getIsNewItemPredicate :: stuff -> String -> Bool And so it is pretty likely that you would write this function just as if it had two parameters : getIsNewItemPredicate stuff str = .... str `isMember` set And later on use the fact that the function is curried to get a
Maybe you're already doing it without realizing it ? For instance for the same kind of problem but without the IO part, the type of the function could be : parenthesis (since they're implicit) : predicate on String :
let isNewItem = getIsNewItemPredicate someStuff
In this case, you're "returning" a function but it may not be as obvious as in Scheme (where curryfication is not an idiom encouraged by the language). -- Jedaï

On Wed, Jun 02, 2010 at 06:40:15PM +0200, Chaddaï Fouché wrote:
On Thu, May 20, 2010 at 3:17 AM, Andrew Sackville-West
wrote: thanks for this. it helps a lot. hmmm... I wonder why it is I never have a problem returning functions in Scheme, but it never occurs to me as I learn Haskell?
getIsNewItemPredicate :: stuff -> (String -> Bool) but in normal Haskell, you wouldn't write this last pair of
getIsNewItemPredicate :: stuff -> String -> Bool And so it is pretty likely that you would write this function just as if it had two parameters : getIsNewItemPredicate stuff str = .... str `isMember` set And later on use the fact that the function is curried to get a
Maybe you're already doing it without realizing it ? For instance for the same kind of problem but without the IO part, the type of the function could be : parenthesis (since they're implicit) : predicate on String :
let isNewItem = getIsNewItemPredicate someStuff
In this case, you're "returning" a function but it may not be as obvious as in Scheme (where curryfication is not an idiom encouraged by the language).
very insightful observation. I do indeed do it, but not conciously as would be the case in a language with Scheme. A

On Saturday 08 May 2010 04:47:14, Andrew Sackville-West wrote:
Please ignore, for the moment, whatever *other* problems (idiomatic or just idiotic) that may exist above and focus on the IO problem.
Sorry, can't entirely. Unless the number of rss items remains low, don't use lists, use a Set.
This code works fine *if* the file "testfile" has in it some subset of the testData list. If it has the complete set, it fails with a "resource busy" exception.
Okay, I can more or less understand what's going on here. Each letter in the testData list gets compared to the contents of the file, but because they are *all* found, the readFile call never has to try and fail to read the last line of the file. Thus the file handle is kept open lazily waiting around not having reached EOF. Fair enough.
But what is the best solution? One obvious one, and the one I'm using now, is to move the appendFile call into a function with guards to prevent trying to append an empty list to the end of the file. This solves the problem not by forcing the read on EOF, but by not bothering to open the file for appending:
writeHistory [] = return () writeHistory ni = appendFile "testfile" . unlines $ ni
And this makes some sense. It's silly to try to write nothing to a file.
Yes. In any case, unless (null newItems) $ appendFile "testfile" $ unlines newItems seems cleaner.
But it also rubs me the wrong way. It's not solving the problem directly -- closing that file handle. So there's my question, how can I close that thing? Is there some way to force it?
For almost all practical purposes, there is (despite the fact that what Stephen said is basically right, although a little overstated in my opinion). You have to force the entire file to be read, the standard idiom is using x `seq` doSomethingElse where x is a value that requires the entire file to be read, in your case x = length currItems is a natural choice. That way, you effectively have made readFile strict without sacrificing too much niceness of the code (withFile and hGetLine mostly are much uglier IMO). It is not entirely failsafe because the file handle needn't be immediately closed upon encountering EOF, it can linger until one of the next GCs, but if you do something substantial between reading the file and trying to reopen it for appending, it is very unlikely that it's not yet closed (not impossible, hence almost all and not all).
Do I need to rework the reading to read one line ahead of whatever I'm testing against (thereby forcing the read of EOF and closing the file)?
thanks
A

Sorry, I got distracted for a couple of days. On Sat, May 08, 2010 at 02:16:27PM +0200, Daniel Fischer wrote:
On Saturday 08 May 2010 04:47:14, Andrew Sackville-West wrote:
Please ignore, for the moment, whatever *other* problems (idiomatic or just idiotic) that may exist above and focus on the IO problem.
Sorry, can't entirely. Unless the number of rss items remains low, don't use lists, use a Set.
I knew people wouldn't be able to entirely, but I was trying to focus on just one problem for the moment. THanks for the tip on Set. I don't anticipate it really being a need, but it's a good idea. Ideally, I'd like to come up with something more robust than just matching on titles as well, but it works for the moment.
This code works fine *if* the file "testfile" has in it some subset of the testData list. If it has the complete set, it fails with a "resource busy" exception.
Okay, I can more or less understand what's going on here. Each letter in the testData list gets compared to the contents of the file, but because they are *all* found, the readFile call never has to try and fail to read the last line of the file. Thus the file handle is kept open lazily waiting around not having reached EOF. Fair enough.
But what is the best solution? One obvious one, and the one I'm using now, is to move the appendFile call into a function with guards to prevent trying to append an empty list to the end of the file. This solves the problem not by forcing the read on EOF, but by not bothering to open the file for appending:
writeHistory [] = return () writeHistory ni = appendFile "testfile" . unlines $ ni
And this makes some sense. It's silly to try to write nothing to a file.
Yes. In any case,
unless (null newItems) $ appendFile "testfile" $ unlines newItems
seems cleaner.
indeed. Thanks.
But it also rubs me the wrong way. It's not solving the problem directly -- closing that file handle. So there's my question, how can I close that thing? Is there some way to force it?
For almost all practical purposes, there is (despite the fact that what Stephen said is basically right, although a little overstated in my opinion). You have to force the entire file to be read, the standard idiom is using
x `seq` doSomethingElse
where x is a value that requires the entire file to be read, in your case x = length currItems is a natural choice.
d'oh.
That way, you effectively have made readFile strict without sacrificing too much niceness of the code (withFile and hGetLine mostly are much uglier IMO).
I know I"m probably not really thinking about it in the right way, but withFile and hGetLine don't seem to be a good fit because of the current structure. I'm using the existing data in the file as a filter to determine what new data to put in the file. like this: items <- oldTitles `seq` liftM (filterItems oldTitles) $ buildItems srcs where builtItems::[String] -> IO [Item], and filterItems::[String] -> [Item] -> [Item] (I suppose I should post the whole file. You can see it here: http://git.swclan.homelinux.org/rss2email.git) Anyway, filterItems is doing 'filter (isNew oldItems) items', effectively, and putting that into a withFile do structure seems not right. Thanks for the help A
participants (4)
-
Andrew Sackville-West
-
Chaddaï Fouché
-
Daniel Fischer
-
Stephen Blackheath [to Haskell-Beginners]