Re: [Haskell-cafe] Mutable data structures and asynchronous exceptions

28 Sep 2017

      If you are in ST, you can not modify anything externally visible without
using unsafe functions. If an exception occurs at any point, your
changes would remain in some broken state, but there would be no
reference to it, so they are just garbage collected and nothing bad
happens. If you need externally visible changes, you have to use IO, but
then you also have the full arsenal of exception handling functions at
your disposal. If you write code which is polymorphic and can either
work in IO or ST, it can not have any visible side effects and thus you
can ignore any exceptions in it (because you could runST it in
completely pure code).

If you think of your array list, lets look at possible signatures for
adding an element:

addPure :: ArrayList a -> a -> ArrayList a

Clearly this just copies the whole array every time, there is nothing
mutable here.

addST :: ArrayList s a -> a -> ST s ()

This one is mutable, but you can never get out of ST with this
ArrayList. While you are in ST, it doesn't matter if an async exception
interrupts you, because you will throw away the result of the ST action
anyway (and thus your broken ArrayList).

addST' :: ArrayList a -> a -> ST s (ArrayList a)

Has to copy the whole array because you can implement addPure with this
and runST.

addIO :: ArrayList a -> a -> IO ()

This can modify the list, but it can (and has to) also handle
exceptions. This is the only one which Java provides.

Regarding the monadic polymorphic (you are talking about MonadPrim,
right?) functions: They can not handle exceptions, because they might be
used in an ST context. But as stated earlier, if you compose them to
another action in some MonadPrim, it will be exception safe because you
can just apply runST to it, constraining MonadPrim to ST and getting a
pure value out of it (and such a value never needs to handle exceptions).

Of course, all this changes as soon as you use unsafeThaw in ST without
proving that you have the ONLY reference to that buffer/array/...

On 09/28/2017 05:51 PM, Станислав Черничкин wrote:
...
Thank your for reply. I think I should clarify what exactly I'd like to
discuss.
The “data structures” I'm talking about are in general single-threaded
mutable containers (like mentioned hashtables, or like ArrayList in
Java). Such structures are not thread safe, yet it would be nice to have
async exception safety. The word “atomicity” I used in a sense mentioned
here https://en.wikipedia.org/wiki/Atomicity_(database_systems) :
operation either occurs or fails and data structure remains in previous
state. In many cases such behavior can be achieved without complex
exception clean-up routines.
Let me give an example. Consider something like ArrayList from Java (an
vector which can grow while elements added). I want to implement 'add'
action. The contract is straightforward – the action may either add
element to structure, possibly reallocating underlying memory buffer,
writing element at last position, incrementing element counter, or it
may throw OutOfMemory exception. But in the latter case the structure
should stay “undamaged”. This could be implemented as following:
if count_equals_capacity thenallocate_new_buffer (let's suppose it
garbage-collected)
copy_elements
update_buffer_pointer
update_capacity_variable
write_new_element_to_buffer
update_count_variable
This code does not contains any explicit exception handling but it
satisfies the contract. The only place there exception can occur is
allocate_new_buffer. In this case action will be interrupted before any
state modifications. All other operations are basically memory writes
and completely safe (assuming code correct and will not segfault).
Things become complicated in presence of async exceptions. Suppose async
exception raised between write_new_element_to_buffer and
update_count_variable. At first glance nothing wrong happed, but if the
buffer holds references, it will now contain a reference to some object,
preventing it from being GC-d, and this reference will be beyond
buffer's count value, because exception occurred before updating count
variable, so programmer will be completely unaware of it. But this still
can be fixed by masking exceptions in critical blocks. And we can
defenelly implement all of this in the IO monad.
The question is how to write “monad polymorhic” code. i.e. code, which
can run both in IO and ST. Mutable data structures benefit from being
“monad polymorhic”. Most Haskell mutable containers (vectors,
hashtables, impure-containers) are build on PrimState monad allowing
them run both in IO and ST. But they seems just ignore the fact that
async exception may corrupt state. Some of them ( e.g.
https://hackage.haskell.org/package/impure-containers-0.4.0/docs/src/Data-Ar...
) seem even ignore that unsafeGrow may throw OutOfMemory (though
attempting to recover from OutOfMemory may be bad idea itself).
2017-09-28 15:45 GMT+03:00 Michael Snoyman mailto:michael@snoyman.com>:
> Since exception can arise at any point, it is not possible to guarantee atomicity of operation, hence mutable data structure may remain in
    incorrect state in case of interruption.
Even if async exceptions didn't exist, we couldn't guarantee
    atomicity in general without specifically atomic functions (like
    atomicModifyIORef or STM), since another thread may access the data
    concurrently and create a data race.
If you're only talking about single-threaded cases—of which ST is
    _basically_ a subset[1]—I don't think you're really worried about
    _atomicity_, but about exception safety. Exception safety goes
    beyond async exceptions, since almost all IO actions can throw some
    form of synchronous exception. For those cases, you can use one of
    the many exception-cleanup functions, like finally, onException,
    bracket, or bracketOnError.
It's true that those functions don't work inside ST, but I'd argue
    you don't need them to. The expected behavior of code that receives
    an async exception is to (1) clean up after itself and (2) rethrow
    the exception. But as ST blocks are supposed to be free of
    externally-visible side effects, worrying about putting its
    variables back into some safe state is unnecessary[2].
To summarize:
* If you need true atomicity, you're in IO and dealing with multiple
    threads. I'd recommend sticking with STM unless you have a strong
    reason to do otherwise.
    * If you are single threaded and in IO, you can get away with
    non-STM stuff more easily, and need to make sure you're using
    exception-aware functions.
    * If you're inside ST, make sure any resources you acquire are
    cleaned up correctly, but otherwise you needn't worry about exceptions.
Also, you may be interested in reading the documentation for
    safe-exceptions[3], which talks more about async exception safety.
[1] I say basically since you'd have to pull out unsafe functions to
    fork a thread that has access to an STVar or similar, though it
    could be done.
    [2] If you're doing something like binding to a C library inside ST,
    you may have some memory cleanup to perform, but the STVars and
    other data structures should never be visible again.
    [3] https://haskell-lang.org/library/safe-exceptions
    https://haskell-lang.org/library/safe-exceptions
On Thu, Sep 28, 2017 at 2:00 PM, Станислав Черничкин
    mailto:schernichkin@gmail.com> wrote:
It's quite hard to implement mutable data structures in presence
        of asynchronous exceptions. Since exception can arise at any
        point, it is not possible to guarantee atomicity of operation,
        hence mutable data structure may remain in incorrect state in
        case of interruption. One can certainly use maskAsyncExceptions#
        and friends to protect critical regions, but masking function
        are living in IO, mutable data structures on other hand trend to
        be state-polymorphic (to allow it usage in ST).
This lead to conflicting requirements: 
        - One should not care about asynchronous exceptions inside ST
        (it is not possible to catch exception in ST, hence not possible
        to use something in invalid state). More over, it is not even
        possible to do write “exception-safe” code, because masking
        functions not available.
        - One should provide accurate masking then using same data
        structures in IO.
So I want do discuss several questions topics on this case.
1. Impact. Are async exceptions really common? Would not be
        easier to say: “ok, things can go bad if you combine async
        exceptions with mutable data structures, just don't do it”. 
2. Documentation. Should library authors explicitly mention
        async exceptions safety? For example
        https://hackage.haskell.org/package/hashtables
        https://hackage.haskell.org/package/hashtables – is it async
        exceptions safe when used in IO? Or even worse
        https://hackage.haskell.org/package/ghc-prim-0.5.1.0/docs/GHC-Prim.html#v:re...
        https://hackage.haskell.org/package/ghc-prim-0.5.1.0/docs/GHC-Prim.html#v:re...
        - what will happened in case of async exception? This functions
        is sate-polimorphic, will it implicitly mask exceptions if used
        from IO?
3. Best practices. How should we deal with problem? Is creating
        separate versions of  code for ST and IO is the only way?
        Probably it is possible to add “mask” to something like
        https://hackage.haskell.org/package/primitive-0.6.2.0/docs/Control-Monad-Pri...
        https://hackage.haskell.org/package/primitive-0.6.2.0/docs/Control-Monad-Pri...
        emit mask in IO instance and NOOP in ST version? Or maybe
        somebody know better patterns for async exeption safe code?
-- 
        Sincerely, Stanislav Chernichkin.
_______________________________________________
        Haskell-Cafe mailing list
        To (un)subscribe, modify options or view archives go to:
        http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
        http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
        Only members subscribed via the mailman list are allowed to post.
-- 
Sincerely, Stanislav Chernichkin.
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.

Re: [Haskell-cafe] Mutable data structures and asynchronous exceptions

Jonas Scholl