Fwd: [Haskell-beginners] More Deserialization Woes

Tom Hobbs

6 Jul 2010 6 Jul '10

9:15 a.m.

Sorry, I meant to send this to the list, rather than just to Stephen (sorry for the Spam). Tom ---------- Forwarded message ---------- From: Tom Hobbs Date: Tue, Jul 6, 2010 at 10:12 AM Subject: Re: [Haskell-beginners] More Deserialization Woes To: Stephen Tetley Hello again, I've been reading through various tutorials and they all put IO as the outermost monad, like you suggest. However, I don't think that's what I want. My IO operation is reading from a network stream, not a file, so I want a failure halfway through should signify a complete failure. So, I believe that the type signature for my function should be; ping :: String -> PortNumber -> Maybe (IO [String]) because the result "Just []" is a valid one and does not signify a failure. However, the result of "IO [Just "a", Just "b", Nothing, Nothing]" would signify that communication failed halfway through and would not make sense in my context. This is what the advice seems to be suggesting I write. But in this case, I'd prefer to return "Nothing" to signify that a problem occurred. So my first question is; because I want to do something so different from the majority of the articles I've read; am I in a niche where my requirement makes sense, or does my requirement make no sense - a theory that is backed up by the fact that no one else seems to be doing that... Now, I'm not sure I can get there by myself, since I'm struggling to get the right incantation of catching errors but I'll keep plugging away at that for a while. But can someone help me with my next question also. Is it possible, to extract the values out of the IO monad so it can be used in pure functions. For example, once ping has returned it's Maybe [IO String], I would like to be able to create another function such as; purePing :: String -> PortNumber -> Maybe [String] purePing a p = removeIOMonad (ping a p) where removeIOMonad Nothing = Nothing removeIOMonad [] = Just [] removeIOMonad (x:xs) = clevelDropIOMagic x : removeIOMonad xs ...or something... Once the IO [String] has been read from the stream, no further IO is necessary, so any other function should be able to just use the list and not worry about the IO stuff. Again my questions are, Is this an okay thing to do, or is my design/idea very, very wrong? Thanks again for the help, Tom On Thu, Jul 1, 2010 at 1:53 PM, Stephen Tetley wrote:

...

Hi Tom

This is bit that is wrong:

((UTF.toString name) : readNames (n-1) h)

You are trying to cons (:) a String onto an IO [String]:

(UTF.toString name) :: String

(readNames (n-1) h) :: IO [String]

(:) :: a -> [a] -> [a]

This is easy mistake to make, I probably still make it myself now and again if I'm typing faster than I'm thinking.

As for ping, you are right that the common answer type will be "IO (Maybe [String])" i.e.

ping :: String -> PortNumber -> IO (Maybe [String])

Generally you will want IO as the outermost (type-) constructor for an expression involving IO, this is because you can't 'escape' IO. There are some exceptions where IO isn't the outermost type constructor, the common one I can think of is you might want to build a list of IO actions [IO ()]. You would then commonly pass on this list to evaluate it later with mapM_.

Neil Mitchell's "IO without (concentrating) on monads tutorial" is a very good place to start: http://neilmitchell.blogspot.com/2010/01/haskell-io-without-monads.html

Best wishes

Stephen

Attachments:

attachment.html (text/html — 5.0 KB)

Show replies by date

Daniel Fischer

6 Jul 6 Jul

9:42 a.m.

New subject: Fwd: More Deserialization Woes

On Tuesday 06 July 2010 11:15:12, Tom Hobbs wrote:

...

Sorry, I meant to send this to the list, rather than just to Stephen (sorry for the Spam).

Tom

---------- Forwarded message ---------- From: Tom Hobbs Date: Tue, Jul 6, 2010 at 10:12 AM Subject: Re: [Haskell-beginners] More Deserialization Woes To: Stephen Tetley

Hello again,

I've been reading through various tutorials and they all put IO as the outermost monad, like you suggest. However, I don't think that's what I want.

My IO operation is reading from a network stream, not a file, so I want a failure halfway through should signify a complete failure. So, I believe that the type signature for my function should be;

ping :: String -> PortNumber -> Maybe (IO [String])

because the result "Just []" is a valid one and does not signify a failure.

Nevertheless, IO (Maybe [String]) is, I believe, the appropriate type.

...

However, the result of "IO [Just "a", Just "b", Nothing, Nothing]" would signify that communication failed halfway through and would not make sense in my context. This is what the advice seems to be suggesting I write. But in this case, I'd prefer to return "Nothing" to signify that a problem occurred.

You can do that by applying sequence (at the type [Maybe String] -> Maybe [String]). Say you have ping0 :: args -> IO [Maybe String] then you'd use ping :: args -> IO (Maybe [String]) ping = fmap sequence ping0

...

So my first question is; because I want to do something so different from the majority of the articles I've read; am I in a niche where my requirement makes sense, or does my requirement make no sense - a theory that is backed up by the fact that no one else seems to be doing that...

Now, I'm not sure I can get there by myself, since I'm struggling to get the right incantation of catching errors but I'll keep plugging away at that for a while.

But can someone help me with my next question also.

Is it possible, to extract the values out of the IO monad so it can be used in pure functions.

Yes, but you needn't (and shouldn't in general). The general pattern is main = do someValues <- getDataIOAction let otherValues = pureStuff someValues outputIO otherValues

...

For example, once ping has returned it's Maybe [IO String], I would like to be able to create another function such as;

purePing :: String -> PortNumber -> Maybe [String] purePing a p = removeIOMonad (ping a p) where removeIOMonad Nothing = Nothing removeIOMonad [] = Just [] removeIOMonad (x:xs) = clevelDropIOMagic x : removeIOMonad xs

...or something...

Once the IO [String] has been read from the stream, no further IO is necessary, so any other function should be able to just use the list and not worry about the IO stuff.

Again my questions are, Is this an okay thing to do, or is my design/idea very, very wrong?

Thanks again for the help,

Tom

Tom Hobbs

10:06 a.m.

New subject: Fwd: More Deserialization Woes

On Tue, Jul 6, 2010 at 10:42 AM, Daniel Fischer wrote:

...

Nevertheless, IO (Maybe [String]) is, I believe, the appropriate type.

I accept that you guys have most likely forgotten more about Haskell than I know, but can you explain to me why this is the appropriate type? Note to self: I think I've got the answer to this at the end of this email!

...

...
However, the result of "IO [Just "a", Just "b", Nothing, Nothing]" would signify that communication failed halfway through and would not make sense in my context. This is what the advice seems to be suggesting I write. But in this case, I'd prefer to return "Nothing" to signify that a problem occurred.

You can do that by applying sequence (at the type [Maybe String] -> Maybe [String]).

Say you have

ping0 :: args -> IO [Maybe String]

then you'd use

ping :: args -> IO (Maybe [String]) ping = fmap sequence ping0

Okay, I've looked at the definition of sequence in Prelude and I think I understand what's going on. I need to experiment with it first before I fully get it. So if ping0 returns IO [Just "a", Just "b", Nothing] then then what would ping return? IO (Just ["a", "b", ?]). Nevermind, I can play with that and work it out later. Thinking to my requirement I think I'd want something more like; ping :: args -> IO (Maybe [String]) ping args = fmap f ps where ps = ping0 args f | Nothing `elem` ps = Nothing | otherwise = fmap sequence ps Or would I get this behaviour for free straight from the use of sequence?

...

...
Is it possible, to extract the values out of the IO monad so it can be used in pure functions.

Yes, but you needn't (and shouldn't in general).

Again, what is the why? My theory is that I want to get [String] from ping, and then (possibly) do lots of other things with it that don't require any kind of IO. But maybe that's not true and I need to think a bit harder about what the caller is likely to do with the result from ping. In terms of performance (memory footprint, speed of execution, etc) does "carrying around the IO monad" make much of a difference? <insert five minute reflective pause here> You're right. I don't need to pull the [String] out of the IO monad. Suprise, suprise, the general pattern is appropriate in my case. Something clicked in my head when I re-read "pureStuff someValues" - this time is was a good click though! Please understand, I'm challenging you on the answers because I don't understand them, not because I think they're wrong! :-) Thanks again for the very quick and helpful answers! Tom

...

The general pattern is

main = do someValues <- getDataIOAction let otherValues = pureStuff someValues outputIO otherValues

...
For example, once ping has returned it's Maybe [IO String], I would like to be able to create another function such as;

purePing :: String -> PortNumber -> Maybe [String] purePing a p = removeIOMonad (ping a p) where removeIOMonad Nothing = Nothing removeIOMonad [] = Just [] removeIOMonad (x:xs) = clevelDropIOMagic x : removeIOMonad xs

...or something...

Once the IO [String] has been read from the stream, no further IO is necessary, so any other function should be able to just use the list and not worry about the IO stuff.

Again my questions are, Is this an okay thing to do, or is my design/idea very, very wrong?

Thanks again for the help,

Tom

Daniel Fischer

11:01 a.m.

New subject: Fwd: More Deserialization Woes

On Tuesday 06 July 2010 12:06:53, Tom Hobbs wrote:

...

On Tue, Jul 6, 2010 at 10:42 AM, Daniel Fischer wrote:

...
Nevertheless, IO (Maybe [String]) is, I believe, the appropriate type.

I accept that you guys have most likely forgotten more about Haskell than I know,

Well, we're here to change that :)

...

but can you explain to me why this is the appropriate type?

For one thing, because main's type is IO (), so ultimately you will have IO as the outermost Monad. If you must seriously intertwine I/O with pure computations, using a Monad- transformer with IO as the innermost Monad would be appropriate, but (admittedly I took just a cursory glance at what you try to do) that didn't seem the case here. If I/O needn't be woven into the computation that strongly, it's better to keep I/O and computations as separate as possible (makes refactoring easier, for example), that usually means you have a small IO-wrapper calling pure functions to work with the data gotten from IO. But as Yitz pointed out, deserialisation throws exceptions if things go wrong, so IO [String] could be more appropriate.

...

Note to self: I think I've got the answer to this at the end of this email!

...
...
However, the result of "IO [Just "a", Just "b", Nothing, Nothing]" would signify that communication failed halfway through and would not make sense in my context. This is what the advice seems to be suggesting I write. But in this case, I'd prefer to return "Nothing" to signify that a problem occurred.

You can do that by applying sequence (at the type [Maybe String] -> Maybe [String]).

Say you have

ping0 :: args -> IO [Maybe String]

then you'd use

ping :: args -> IO (Maybe [String]) ping = fmap sequence ping0

Okay, I've looked at the definition of sequence in Prelude and I think I understand what's going on. I need to experiment with it first before I fully get it.

That's normal. The definition of sequence isn't immediately accessible.

...

So if ping0 returns IO [Just "a", Just "b", Nothing] then then what would ping return? IO (Just ["a", "b", ?]). Nevermind, I can play with that and work it out later.

Prelude> sequence $ [Just (1 :: Int), Just 2, Just 3] Just [1,2,3] Prelude> sequence $ [Just (1 :: Int), Just 2, Nothing, Just 3] Nothing And sequence short-cuts when there's a Nothing: Prelude> sequence $ [Just (1 :: Int), Just 2, Nothing, Just 3] ++ replicate 1000000000 (Just 4) Nothing (0.00 secs, 0 bytes) Prelude> sequence $ [Just (1 :: Int), Just 2, Nothing, Just 3] ++ repeat (Just 4) Nothing (0.01 secs, 0 bytes)

...

Thinking to my requirement I think I'd want something more like;

ping :: args -> IO (Maybe [String]) ping args = fmap f ps where ps = ping0 args f | Nothing `elem` ps = Nothing

| otherwise = fmap sequence ps

Or would I get this behaviour for free straight from the use of sequence?

Yes, sequence does that automatically for you :) The problem is that sequence must traverse the entire list to know whether it'll return Nothing or (Just list) if there's no Nothing, so it can take a while if the list is long (and forever, if the list is infinite and doesn't contain a Nothing).

...

...
...
Is it possible, to extract the values out of the IO monad so it can be used in pure functions.

Yes, but you needn't (and shouldn't in general).

Again, what is the why?

My theory is that I want to get [String] from ping, and then (possibly) do lots of other things with it that don't require any kind of IO. But maybe that's not true and I need to think a bit harder about what the caller is likely to do with the result from ping. In terms of performance (memory footprint, speed of execution, etc) does "carrying around the IO monad" make much of a difference?

<insert five minute reflective pause here>

You're right. I don't need to pull the [String] out of the IO monad. Suprise, suprise, the general pattern is appropriate in my case. Something clicked in my head when I re-read "pureStuff someValues" - this time is was a good click though!

Please understand, I'm challenging you on the answers because I don't understand them, not because I think they're wrong! :-)

I understand. I would've gone into more detail, but I was in a hurry :)

...

Thanks again for the very quick and helpful answers!

Tom

Yitzchak Gale

10:07 a.m.

New subject: More Deserialization Woes

Tom Hobbs wrote:

...

I've been reading through various tutorials and they all put IO as the outermost monad, like you suggest. However, I don't think that's what I want.

It is definitely what you want.

...

...am I in a niche where my requirement makes sense

No, you are doing something quite routine.

...

or does my requirement make no sense

Your requirement is fine, and you would have no trouble satisfying it with IO as your outer type. But there are two basic approaches to dealing with failure: returning a pure value that indicates the failure, like Maybe or Either, or throwing an exception in IO that is not reflected in the type. Since you are using Data.Binary for deserialization, that is designed to use the second method. So rather than spending more time on how to structure your types to indicate failure, let's leave that aside for now and focus on how to do deserialization. Error processing will automatically happen the way you say - if anything goes really wrong in the middle, an exception will be thrown and the entire operation will terminate immediately. Later on, you can learn how to catch the exception and do something other than end your program with the standard error message.

...

readNames 0 _ = [] readNames n h = do length <- fmap (fromIntegral . runGet getWord32be) $ L.hGet h 4 name <- L.hGet h length (UTF.toString name) : readNames (n-1) h

Besides the type errors, which others have been helping you with, (and another minor point - avoid using "length" as a variable name, it is the name of one of the most commonly used Prelude functions), let's look at the whole approach. You are ping-ponging back and forth here between the Get monad and manually reading ByteStrings from the handle. The idea of the Get monad is to give a complete description of your serialization format. Then, reading the ByteStrings will be driven by your serialization format - just the right number of bytes will automatically be read off the wire at each stage. Here is the serialization format (note that we're not reading anything here, just describing the format): readNames :: Int -> Get [String] readNames n = replicateM n $ do len <- getWord32be name <- getByteString len return $ UTF8.toString name Now, in your "main" function (whose type is IO ()), you can write: names <- fmap (runGet $ readNames n) $ L.hGetContents h That will read bytes off the wire lazily, just the right number of bytes to deserialize n names. Of course, that will leave your handle in an unusable state. If you have more to read after that, you have a few options. Best is to combine everything you need to read out of that handle into a single Get monad object that describes the entire deserialization. Another (messier) approach is to use runGetState instead of runGet - that gives you, in addition to the deserialized data, a lazy ByteString that represents additional bytes that can later be read off the handle. Regards, Yitz

Yitzchak Gale

10:58 a.m.

New subject: More Deserialization Woes

I wrote:

...

readNames :: Int -> Get [String] readNames n = replicateM n $ do len <- getWord32be name <- getByteString len return $ UTF8.toString name

Sorry, there's a type error there. You need to convert the Word32 to an Int: name <- getByteString $ fromIntegral len Regards, Yitz

5488

Age (days ago)

5488

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Daniel Fischer
Tom Hobbs
Yitzchak Gale