What does unsafePerformIO do to the stack

Consider the following program: module Stack where import System.IO.Unsafe main = print (sim (replicate 1299959 ())) sim [] = True sim (_:xs) = goodStack (sim xs) goodStack x = fromJust (Just x) --no stack overflow badStack x = unsafePerformIO (return x) --stack overflow fromJust (Just x) = x I always thought that unsafePerformIO would do something similar as goodStack. i.e. be stack neutral: $ ghc --make -o stack Stack [1 of 1] Compiling Main ( Stack.hs, Stack.o ) Linking stack ... $ stack +RTS -K0.00001M True But if you exchange goodStack with badStack, the picture changes unfortunately to: $ ghc --make -o stack Stack [1 of 1] Compiling Main ( Stack.hs, Stack.o ) Linking stack ... $ stack +RTS -K9.883647M Stack space overflow: current size 9883644 bytes. Use `+RTS -Ksize' to increase it. $ stack +RTS -K9.883648M True I am using: $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.8.2 Is this behaviour necessary? Is there any work around, e.g., employing the foreign function interface? Thanks for your time! Bernd

Bernd Brassel wrote:
Consider the following program:
module Stack where
import System.IO.Unsafe
main = print (sim (replicate 1299959 ()))
sim [] = True sim (_:xs) = goodStack (sim xs)
goodStack x = fromJust (Just x) --no stack overflow badStack x = unsafePerformIO (return x) --stack overflow
fromJust (Just x) = x
goodStack == id, and GHC even with no optimisation will transform it into id, and inline it into sim. So with goodStack, sim ends up being tail-recursive. With badStack, sim is no longer tail recursive (unsafePerformIO is not inlined), so it runs out of stack. Simple!
Is this behaviour necessary? Is there any work around, e.g., employing the foreign function interface?
There's unsafeInlinePerformIO (sometimes called inlinePerformIO), which is usable in certain cases, but be very careful. From Data.ByteString.Internal: {-# INLINE inlinePerformIO #-} inlinePerformIO :: IO a -> a #if defined(__GLASGOW_HASKELL__) inlinePerformIO (IO m) = case m realWorld# of (# _, r #) -> r #else inlinePerformIO = unsafePerformIO #endif But even this might not give you tail recursion, depending on the context. Cheers, Simon

Thanks for your answer Simon. Simon Marlow wrote:
Bernd Brassel wrote:
Consider the following program:
module Stack where
import System.IO.Unsafe
main = print (sim (replicate 1299959 ()))
sim [] = True sim (_:xs) = goodStack (sim xs)
goodStack x = fromJust (Just x) --no stack overflow badStack x = unsafePerformIO (return x) --stack overflow
fromJust (Just x) = x
goodStack == id, and GHC even with no optimisation will transform it into id, and inline it into sim. So with goodStack, sim ends up being tail-recursive. With badStack, sim is no longer tail recursive (unsafePerformIO is not inlined), so it runs out of stack. Simple!
Is it really that simple? I guess that in a lazy language we have to look a bit closer to see what is tail recursive and what is not. If I understand you correctly, you say that if goodStack was not inlined you would have a bad stack as well, right? But look at what it would be doing. In a lazy language the call to sim would go to the heap and whatever goodStack does to the stack is already done before sim is restarted. And the same could be true with the unsafePerformIO-return combination. What is the reason to hold anything on the stack for this after the call to unsafe is finished? I have tried the example with badStack in one other compiler and two interpreters. None of them has any problem running the example. For one of the interpreters I could exactly measure that the stack is constant the whole time. And I know that no optimisation or inlining is going on for that interpreter. Just try the example with hugs. It easily runs through while replacing badStack with the function pushStack True = True immediately runs out of memory. (With this function, the example is indeed not tail recursive and your argument is valid.) So there is definitely something that unsafePerformIO does to the stack in the ghc that is special to that compiler.

Bernd Brassel wrote:
Thanks for your answer Simon.
Simon Marlow wrote:
Bernd Brassel wrote:
Consider the following program:
module Stack where
import System.IO.Unsafe
main = print (sim (replicate 1299959 ()))
sim [] = True sim (_:xs) = goodStack (sim xs)
goodStack x = fromJust (Just x) --no stack overflow badStack x = unsafePerformIO (return x) --stack overflow
fromJust (Just x) = x goodStack == id, and GHC even with no optimisation will transform it into id, and inline it into sim. So with goodStack, sim ends up being tail-recursive. With badStack, sim is no longer tail recursive (unsafePerformIO is not inlined), so it runs out of stack. Simple!
Is it really that simple? I guess that in a lazy language we have to look a bit closer to see what is tail recursive and what is not. If I understand you correctly, you say that if goodStack was not inlined you would have a bad stack as well, right?
My apologies - I oversimplified things. You're quite right, it looks like the unsafePerformIO version ought to be tail-recursive too. It turns out to be rather subtle. If you replace unsafePerformIO by unsafeDupablePerformIO (from GHC.IOBase), then you do indeed get tail-recursion. But this is only due to a clever trick in the RTS: what happens is that (sim xs) is a thunk, so when evaluating it, even in a tail-recursive position, an update frame is pushed on the stack. The next recursive call to (sim xs) pushes another update frame on the stack, and so on. Since these update frames are all adjacent to one another, a trick known as "stack squeezing" can squash them down into a single frame, and this is what happens for unsafeDupablePerformIO. The ordinary unsafePerformIO is performing stack squeezing once per call, because the stack squeezing is done by the same code that does the "duplicate computation" check that unsafePerformIO needs to do. Stack squeezing doesn't look at the same bit of stack twice, so subsequent squeezings don't manage to remove any update frames. I've fixed this in my tree. It also needed some tweaks to the heuristic which decides whether to squeeze or not based on a cost/benefit tradeoff. So the upshot is: you can use unsafeDupablePerformIO right now, or you can wait until I've tested and committed this patch to get tail-recursion with unsafePerformIO. I've no idea how it works in Hugs, you'll have to ask them :) Cheers, Simon

Simon Marlow wrote:
So the upshot is: you can use unsafeDupablePerformIO right now, or you can wait until I've tested and committed this patch to get tail-recursion with unsafePerformIO.
Wow, thank you for the detailed answer! That was really interesting.
I've no idea how it works in Hugs, you'll have to ask them :)
At least I can tell you how it is done in that other interpreter I mentioned. The reason that you can be better for that example is that the argument of badStack is not shared. No need to push an update frame for it. I am pretty sure that it is the same with the other compiler, both for Curry not for Haskell btw., but like you I am not sure with Hugs. If the ghc does not take into account what is statically not shared, I guess there is potential for a good optimization here which would go well beyond unsafe programs. Greetings and thanks! Bernd

There is another point that makes me wonder now. If the update frame for the recursive call is the problem then my solution with foreign C functions would produce a bad stack also. But this is not the case. The code looks now like this: sim [] = True sim (_:xs) = yags (sim xs) ref = cinitialize yags x = replace (C_Ref ref) x () And it is running within 0.15M of stack. Did your explanation also account for this phenomenon? Sorry to take that much time off you with this! Bernd

Bernd Brassel wrote:
There is another point that makes me wonder now. If the update frame for the recursive call is the problem then my solution with foreign C functions would produce a bad stack also. But this is not the case. The code looks now like this:
sim [] = True sim (_:xs) = yags (sim xs)
ref = cinitialize
yags x = replace (C_Ref ref) x ()
And it is running within 0.15M of stack. Did your explanation also account for this phenomenon?
Sorry to take that much time off you with this! Bernd
The stack of update frames is not in itself a problem, because stack squeezing removes them leaving you with O(1) stack again. The problem with unsafePerformIO was that the duplication-protection was interfering with stack squeezing. Regarding sharing analysis, we did used to have an update analyser in GHC, but it was expensive to run and rarely gave worthwhile benefits, so eventually we dropped it. Perhaps there are some trivial examples of unshared thunks that we could spot, though. Cheers, Simon

Simon Marlow wrote:
Perhaps there are some trivial examples of unshared thunks that we could spot, though.
The sharing analysis in the interpreter is also very simple and inexpensive. But the gain is frequent. Maybe giving it a try would be worthwhile. Thanks again for all your answers! Bernd

I have got around the problem by defining my unsafe actions by the foreign function interface. But I still think there is a bug concerning stack use in unsafePerformIO in ghc. And I also think that this bug potentially concerns every use of unsafe. Or did I just not get the point of your argument and my answer to it was still beside the point? Greetings Bernd
participants (2)
-
Bernd Brassel
-
Simon Marlow