One-shot semantics in GHC event manager

In ba2555ef and a6f52b19 one-shot semantics were added to event manager in `base`. If my understanding of this code is correct, in this mode the event manager will use only notify the user of the first event on a registered fd after which point the fd will need to be re-registered to get another notification. It appears this was done to optimize the common case of file I/O where only a single event is needed This change lead to a regression[1] in Bas van Dijk's usb library under GHC 7.8. usb's use of the event manager requires that all events on an fd are reported until the fd is registered or else hardware events are lost. I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code. Going forward library authors now need to worry about whether the system event manager is one-shot or not. Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses. Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics? Cheers, - Ben [1] https://github.com/basvandijk/usb/issues/7

On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari
In ba2555ef and a6f52b19 one-shot semantics were added to event manager in `base`. If my understanding of this code is correct, in this mode the event manager will use only notify the user of the first event on a registered fd after which point the fd will need to be re-registered to get another notification.
Yes.
It appears this was done to optimize the common case of file I/O where only a single event is needed
Yes.
This change lead to a regression[1] in Bas van Dijk's usb library under GHC 7.8. usb's use of the event manager requires that all events on an fd are reported until the fd is registered or else hardware events are lost.
The change should only affect libraries using GHC.Event (or other modules underneath), which are exposed, but considered "internal". I searched hackage before making this change and usb was the only library that came up using GHC.Event directly. I'm not sure if I sent the usb maintainers an email now... I really should have done that to save you the effort of hunting down the problem in usb.
I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code.
It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.
Going forward library authors now need to worry about whether the system event manager is one-shot or not.
Yes, but only library authors using the internal interface.
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics?
The system event manager is configured by GHC.Thread using ONE_SHOT if the system supports it. You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb? -Andi
Cheers,
- Ben

Thanks for your quick reply!
Andreas Voellmy
On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari
wrote: I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code.
It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.
What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.
Going forward library authors now need to worry about whether the system event manager is one-shot or not.
Yes, but only library authors using the internal interface.
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics?
The system event manager is configured by GHC.Thread using ONE_SHOT if the system supports it.
You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb?
I had considered this but looked for other options for two reasons, * `loop` isn't exported by GHC.Event * there is already a perfectly usable event loop thread in existence I'm a bit curious to know what advantages ONE_SHOT being per-manager carries over per-fd. If the advantages are large enough then we can just export `loop` and be done with it but the design as it stands strikes me as a bit odd. Cheers, - Ben

On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari
Thanks for your quick reply!
Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari
wrote: I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code.
It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.
What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.
Generally, ONE_SHOT helped improve performance. I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.
Going forward library authors now need to worry about whether the system event manager is one-shot or not.
Yes, but only library authors using the internal interface.
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics?
The system event manager is configured by GHC.Thread using ONE_SHOT if
the
system supports it.
You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb?
I had considered this but looked for other options for two reasons,
* `loop` isn't exported by GHC.Event
Right - it wouldn't make sense to export the system EventManager's loop. However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.
* there is already a perfectly usable event loop thread in existence
I'm a bit curious to know what advantages ONE_SHOT being per-manager carries over per-fd. If the advantages are large enough then we can just export `loop` and be done with it but the design as it stands strikes me as a bit odd.
I suspect that a per-FD design would perform just as well, but I need to look at the details to be sure. Cheers,
- Ben

Another way to fix usb would be to re-register the callback after a
previously registered callback is fired. Of course it is cheaper not to
have to re-register, but re-registration in the latest IO manager should be
fairly cheap, so this may not be a performance problem for usb. Would this
work for you?
You could also use a CPP directive to only do this for GHC 7.8 and up.
If we want to allow usb to work unchanged, then we will have to revert to
the non-ONE_SHOT behavior of registerFd and add some things to the API to
allow GHC.Thread to register with ONE_SHOT behavior. Reverting could break
clients of this semi-public API who have adapted to the 7.8 behavior.
There probably aren't of these clients other than GHC.Thread, so this may
not be a big issue.
To do per-FD setting of ONE_SHOT or not, we actually need to have
per-subscription settings, since there can be multiple invocations to
register callbacks for a single file descriptor (e.g. from two different
threads) and they might want different settings. If all the clients want
ONE_SHOT we use ONE_SHOT registration, if all want persistent registrations
we don't use ONE_SHOT. If it is mixed, then the manager has to choose one
or the other and simulate the required behavior for the other registrations
(e.g. choose persistent and automatically unregister for ONE_SHOT
registrations). We could either always make the same choice (e.g. if there
is a mix, use persistent), or we could have per-FD setting that is
configurable by clients.
Andi
On Sat, Oct 11, 2014 at 9:31 AM, Andreas Voellmy
On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari
wrote: Thanks for your quick reply!
Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari
wrote: I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a
fair
bit of complexity to the subsystem while breaking backwards compatibility with library code.
It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.
What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.
Generally, ONE_SHOT helped improve performance. I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.
Going forward library authors now need to worry about whether the system event manager is one-shot or not.
Yes, but only library authors using the internal interface.
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change
instead add a variant of `registerFd` which exposes one-shot semantics?
The system event manager is configured by GHC.Thread using ONE_SHOT if
and the
system supports it.
You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb?
I had considered this but looked for other options for two reasons,
* `loop` isn't exported by GHC.Event
Right - it wouldn't make sense to export the system EventManager's loop. However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.
* there is already a perfectly usable event loop thread in existence
I'm a bit curious to know what advantages ONE_SHOT being per-manager carries over per-fd. If the advantages are large enough then we can just export `loop` and be done with it but the design as it stands strikes me as a bit odd.
I suspect that a per-FD design would perform just as well, but I need to look at the details to be sure.
Cheers,
- Ben

Andreas Voellmy
Another way to fix usb would be to re-register the callback after a previously registered callback is fired. Of course it is cheaper not to have to re-register, but re-registration in the latest IO manager should be fairly cheap, so this may not be a performance problem for usb. Would this work for you?
You could also use a CPP directive to only do this for GHC 7.8 and up.
This is a possibility that I had considered. It would require that some code be reworked however. I'm leaning towards just using our non-event-manager-driven fallback path on GHC 7.8 for now until the event manager semantics can be worked out.
If we want to allow usb to work unchanged, then we will have to revert to the non-ONE_SHOT behavior of registerFd and add some things to the API to allow GHC.Thread to register with ONE_SHOT behavior. Reverting could break clients of this semi-public API who have adapted to the 7.8 behavior. There probably aren't of these clients other than GHC.Thread, so this may not be a big issue.
I don't think we need to revert the changes just for usb as we have the fallback path that will work for now. I do think it might be good to explore other points in the design space, however.
To do per-FD setting of ONE_SHOT or not, we actually need to have per-subscription settings, since there can be multiple invocations to register callbacks for a single file descriptor (e.g. from two different threads) and they might want different settings.
Agreed.
If all the clients want ONE_SHOT we use ONE_SHOT registration, if all want persistent registrations we don't use ONE_SHOT. If it is mixed, then the manager has to choose one or the other and simulate the required behavior for the other registrations (e.g. choose persistent and automatically unregister for ONE_SHOT registrations). We could either always make the same choice (e.g. if there is a mix, use persistent), or we could have per-FD setting that is configurable by clients.
I would think we would actually want the desired one-shottedness to be a property of the registration. We would have, -- | Will this registration be valid until unregistration ('ManyShot') -- or only for a single event ('OneShot')? data Lifetime = OneShot | ManyShot registerFd :: EventManager -> Fd -> Event -> Lifetime -> IO FdKey The event manager would then have to choose either ONE_SHOT or not in the case of heterogenous registrations and emulate the other set, as you said. This seems like a nice interface as it allows the user to specify the semantics that they want (instead of working around whatever the manager happens to provide) and gives the the event manager enough knowledge and freedom to do what it can to efficiently implement what is needed. Cheers, - Ben

Andreas Voellmy
On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari
wrote: Thanks for your quick reply!
What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.
Generally, ONE_SHOT helped improve performance.
Sure, I would certainly believe this.
I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.
I've started playing around with the code to see what might be possible here. We'll see how far I get.
I had considered this but looked for other options for two reasons,
* `loop` isn't exported by GHC.Event
Right - it wouldn't make sense to export the system EventManager's loop.
The system EventManager's loop is `GHC.Event.Manager.loop`, no?
However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.
Right, but `GHC.Event.Manager` is not exported by `base`. Cheers, - Ben

On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari
wrote: Thanks for your quick reply!
What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.
Generally, ONE_SHOT helped improve performance.
Sure, I would certainly believe this.
I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.
I've started playing around with the code to see what might be possible here. We'll see how far I get.
I had considered this but looked for other options for two reasons,
* `loop` isn't exported by GHC.Event
Right - it wouldn't make sense to export the system EventManager's loop.
The system EventManager's loop is `GHC.Event.Manager.loop`, no?
Yes, but it will be invoked by GHC.Thread and any other callers of it will simply block indefinitely waiting for the thread that is running loop to give it up - which will typically never happen.
However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.
Right, but `GHC.Event.Manager` is not exported by `base`.
Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
Cheers,
- Ben

Andreas Voellmy
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Yes, but it will be invoked by GHC.Thread and any other callers of it will simply block indefinitely waiting for the thread that is running loop to give it up - which will typically never happen.
Right.
However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.
Right, but `GHC.Event.Manager` is not exported by `base`.
Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int). Sadly I have to run for the night and will be on a bike ride tomorrow but perhaps I can come back to it on Monday. Feel free to read it over and see if I missed something. Cheers, - Ben [1] https://github.com/bgamari/packages-base/compare/ghc:ghc-7.8...event-rework

Ben Gamari
Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes. What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult. Cheers, - Ben

For the record, I talked to Ben earlier on IRC, and I can provide him
with a machine to do intense benchmarks of the new I/O manager.
Also, if any other developers (like Andreas, Johan, Bryan, etc) in
this space want a big machine to test it on, I can probably equip you
with one (or several). Since Rackspace is so gracious to us, we can
immediately allocate high-powered, physical (i.e. not Xen, but real
machines) machines to do high-scale testing on.[1]
In any case, it's not hard to do and only takes a few minutes, so Ben:
let me know. (I've thought it would be neat to implement a leasing
system somehow, where a developer could lease a few machines for a
short period of time, at which point they expire and a background job
cleans them up.)
[1] You can find the hardware specs here; GHC benchmarking is probably
best suited for the "OnMetal I/O" type, which has 40 cores, 2x PCIe
flash and 128GB of RAM -
http://www.rackspace.com/cloud/servers/onmetal/
On Mon, Oct 13, 2014 at 2:05 PM, Ben Gamari
Ben Gamari
writes: Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes.
What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

On 2014-10-13 at 23:33:13 +0200, Austin Seipp wrote: [...]
Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1]
In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.)
I'd like to add to this; If there's demand to provide SSH accounts to MSYS2-environments for developing/testing GHC patches or generally debugging/fixing GHC issues occuring on Windows, we may be able to provide such (ephemeral) accounts. Cheers, hvr

This is awesome. I'd like to try to recreate some of the evaluations for
the multicore IO manager paper on that 40 core system at backspace. How can
I get access to this? I'll jump on IRC - maybe it is easier to chat in
realtime.
On Mon, Oct 13, 2014 at 5:33 PM, Austin Seipp
For the record, I talked to Ben earlier on IRC, and I can provide him with a machine to do intense benchmarks of the new I/O manager.
Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1]
In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.)
[1] You can find the hardware specs here; GHC benchmarking is probably best suited for the "OnMetal I/O" type, which has 40 cores, 2x PCIe flash and 128GB of RAM - http://www.rackspace.com/cloud/servers/onmetal/
Ben Gamari
writes: Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime
On Mon, Oct 13, 2014 at 2:05 PM, Ben Gamari
wrote: proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes.
What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards,
Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Hey Andreas,
The basic rundown is that if we equip you with an account, you can
just do it yourself. Although we'd like to restrict access a bit more;
I'll figure something out.
Yeah, if you hop on IRC, we can chat quickly about it and work
something out in the mean time.
On Tue, Oct 14, 2014 at 10:57 AM, Andreas Voellmy
This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime.
On Mon, Oct 13, 2014 at 5:33 PM, Austin Seipp
wrote: For the record, I talked to Ben earlier on IRC, and I can provide him with a machine to do intense benchmarks of the new I/O manager.
Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1]
In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.)
[1] You can find the hardware specs here; GHC benchmarking is probably best suited for the "OnMetal I/O" type, which has 40 cores, 2x PCIe flash and 128GB of RAM - http://www.rackspace.com/cloud/servers/onmetal/
On Mon, Oct 13, 2014 at 2:05 PM, Ben Gamari
wrote: Ben Gamari
writes: Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes.
What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards,
Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Andreas Voellmy
This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime.
Do you suppose you could document the process a bit as you do this? I've been having a bit of trouble reproducing your numbers with GHC 7.8 on the hardware that I have access to. Cheers, - Ben

Yes, I'll try to describe it and script it so that others can understand
the benchmarks and run it easily as well.
On Tue, Oct 14, 2014 at 1:23 PM, Ben Gamari
Andreas Voellmy
writes: This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime.
Do you suppose you could document the process a bit as you do this? I've been having a bit of trouble reproducing your numbers with GHC 7.8 on the hardware that I have access to.
Cheers,
- Ben

Hi Ben, Austin,
Is there any chance of Ben's event manager patch landing in GHC-7.8.4?
Bas
On 13 October 2014 21:05, Ben Gamari
Ben Gamari
writes: Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes.
What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I haven't had a chance to dig into Ben's patch yet, but I expect it will
accepted soon - I don't think the change will affect performance.
Austin, would it be possible to get a relatively minor change to the event
manager into 7.8.4? It may change a semi-public API under GHC.Event, but
will not change any public API. What is our time window?
Andi
On Fri, Oct 17, 2014 at 8:34 AM, Bas van Dijk
Hi Ben, Austin,
Is there any chance of Ben's event manager patch landing in GHC-7.8.4?
Bas
Ben Gamari
writes: Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime
On 13 October 2014 21:05, Ben Gamari
wrote: proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes.
What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

The catch with such a change is that there is no macro to determine
whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure
things out (they have to use `MIN_VERSION_base` from Cabal). But maybe
that doesn'tm atter too much. So, yes, I think it's doable, but that's
a sticky bit.
Andreas - want me to go ahead and get you some hardware to test Ben's
patch in the mean time? This way we'll at least not leave it hanging
until the last moment...
On Fri, Oct 17, 2014 at 7:59 AM, Andreas Voellmy
I haven't had a chance to dig into Ben's patch yet, but I expect it will accepted soon - I don't think the change will affect performance.
Austin, would it be possible to get a relatively minor change to the event manager into 7.8.4? It may change a semi-public API under GHC.Event, but will not change any public API. What is our time window?
Andi
On Fri, Oct 17, 2014 at 8:34 AM, Bas van Dijk
wrote: Hi Ben, Austin,
Is there any chance of Ben's event manager patch landing in GHC-7.8.4?
Bas
On 13 October 2014 21:05, Ben Gamari
wrote: Ben Gamari
writes: Andreas Voellmy
writes: On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari
wrote: Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.
I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).
As it turns out the patch seems to get through the testsuite after a few minor fixes.
What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.
Cheers,
- Ben
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Hi,
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...
I will also try this with two 20-core machines connected 10G on Monday.
I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set. Server: witty 8080 -r -a -s +RTS -N<n> *1 Measurement tool: weighttp -n 100000 -c 1000 -k -t 19 http://192.168.0.1:8080/ Measurement env: two 20 core (w/o HT) machines directly connected 10G Here is result (req/s): -N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 37,608 69,376 131,686 237,783 333,946 head and 7.8.3 has almost the same performance. But I saw significant performance regression in Ben's patch set. *1 https://github.com/kazu-yamamoto/witty/blob/master/README.md P.S. - Scalability is not linear as you can see. - prefork (witty -n <n>) got much better result than Mio (witty +RTS <n>) (677,837 req/s for witty 8080 -r -a -s -n 16) --Kazu

Kazu Yamamoto
Hi,
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...
I will also try this with two 20-core machines connected 10G on Monday.
I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set.
Server: witty 8080 -r -a -s +RTS -N<n> *1 Measurement tool: weighttp -n 100000 -c 1000 -k -t 19 http://192.168.0.1:8080/ Measurement env: two 20 core (w/o HT) machines directly connected 10G
Here is result (req/s):
-N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 37,608 69,376 131,686 237,783 333,946
head and 7.8.3 has almost the same performance. But I saw significant performance regression in Ben's patch set.
Hmm, uh oh. Thanks for testing this. I'll try to reproduce this on my end. It looks like it shouldn't be so hard as even the single-threaded performance regresses drastically. Just to confirm, you are using the latest revision of D347? Cheers, - Ben

Ben,
Hmm, uh oh. Thanks for testing this. I'll try to reproduce this on my end. It looks like it shouldn't be so hard as even the single-threaded performance regresses drastically. Just to confirm, you are using the latest revision of D347?
I used the following as you suggested: https://github.com/bgamari/packages-base/compare/ghc:ghc-7.8...event-rework I cannot tell whether or not this is idential to D347. --Kazu

Kazu Yamamoto
Hi,
I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set.
Server: witty 8080 -r -a -s +RTS -N<n> *1 Measurement tool: weighttp -n 100000 -c 1000 -k -t 19 http://192.168.0.1:8080/ Measurement env: two 20 core (w/o HT) machines directly connected 10G
Here is result (req/s):
-N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 37,608 69,376 131,686 237,783 333,946
head and 7.8.3 has almost the same performance. But I saw significant performance regression in Ben's patch set.
This may be due to lacking INLINEs on definitions added in GHC.Event.Internal [1]. I'm currently in the middle of reproducing these results on an EC2 instance to confirm this. So far the results look much more consistent than my previous attempts at benchmarking on my own hardware. Cheers, - Ben [1] https://github.com/bgamari/ghc/tree/event-rework-7.10

Ben,
This may be due to lacking INLINEs on definitions added in GHC.Event.Internal [1]. I'm currently in the middle of reproducing these results on an EC2 instance to confirm this. So far the results look much more consistent than my previous attempts at benchmarking on my own hardware.
If using https://github.com/bgamari/packages-base/commits/event-rework is a right way, please push the INLINE commit to this repo? I will try it gain. --Kazu

Kazu Yamamoto
Ben,
This may be due to lacking INLINEs on definitions added in GHC.Event.Internal [1]. I'm currently in the middle of reproducing these results on an EC2 instance to confirm this. So far the results look much more consistent than my previous attempts at benchmarking on my own hardware.
If using https://github.com/bgamari/packages-base/commits/event-rework is a right way, please push the INLINE commit to this repo? I will try it gain.
I already pushed it. The commit in question is 5dce47eb8415eb31e1c6759b6f6a2ef5bfe32470. Thanks for the benchmarking! Cheers, - Ben

I already pushed it. The commit in question is 5dce47eb8415eb31e1c6759b6f6a2ef5bfe32470. Thanks for the benchmarking!
I believe this is in bgamari/ghc (for GHC 7.10?). I'm using bgamari/packages-base for GHC 7.8 and asking to push the same commit to this repo. Actually I compared the latest Internal.hs in bgamari/ghc and one in bgamari/packages-base and I saw *additional* differences. So, I hesitated to apply the patch by myself. --Kazu

Kazu Yamamoto
I already pushed it. The commit in question is 5dce47eb8415eb31e1c6759b6f6a2ef5bfe32470. Thanks for the benchmarking!
I believe this is in bgamari/ghc (for GHC 7.10?). I'm using bgamari/packages-base for GHC 7.8 and asking to push the same commit to this repo.
Ahh, yes. Sorry, I forgot you were on 7.8. Just pushed a new patch to the event-rework-squashed branch [1].
Actually I compared the latest Internal.hs in bgamari/ghc and one in bgamari/packages-base and I saw *additional* differences. So, I hesitated to apply the patch by myself.
There were a few changes necessary due to AMP fallout. They are mostly harmless. Cheers, - Ben [1] https://github.com/bgamari/packages-base/commit/01ac6692f04378052ff7ad844409...

Kazu Yamamoto
Ahh, yes. Sorry, I forgot you were on 7.8. Just pushed a new patch to the event-rework-squashed branch [1].
I believe that you are trying to merge your patches to GHC 7.8.4? If not, I will work on the GHC head branch.
Well, Bas was wondering whether this would be possible. At this point I'm a bit on the fence; on one hand it's not a crucial fix (we have a workaround in usb) and it may involve changes to exported interfaces (although not very high visibility). On the other hand, it's a pretty easy change to make and it cleans up the semantics of the event manager nicely. Frankly I doubt that the performance characteristics of the patch will change much between HEAD and ghc-7.8 (up to the difference that you've already reported in your last set of benchmarks). Cheers, - Ben

Well, Bas was wondering whether this would be possible. At this point I'm a bit on the fence; on one hand it's not a crucial fix (we have a workaround in usb) and it may involve changes to exported interfaces (although not very high visibility). On the other hand, it's a pretty easy change to make and it cleans up the semantics of the event manager nicely.
I benchmarked your patches on GHC 7.8 branch: -N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 84,472 140,978 291,550 488,834 523,837 The inline patch works very nice. :-) # And I was disappointed a bit because GHC does not automatically do # this inline. --Kazu

Kazu Yamamoto
Well, Bas was wondering whether this would be possible. At this point I'm a bit on the fence; on one hand it's not a crucial fix (we have a workaround in usb) and it may involve changes to exported interfaces (although not very high visibility). On the other hand, it's a pretty easy change to make and it cleans up the semantics of the event manager nicely.
I benchmarked your patches on GHC 7.8 branch:
-N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 84,472 140,978 291,550 488,834 523,837
The inline patch works very nice. :-)
Awesome! Out of curiosity are these numbers from single runs or do you average? What are the uncertainties on these numbers? Even on the Rackspace machines I was finding very large variances in my benchmarks, largely due to far outliers. I didn't investigate too far but it seems that a non-trivial fraction of connections were failing. At some point it would be nice to chat about how we might replicate your benchmarking configuration on the Rackspace boxen.
# And I was disappointed a bit because GHC does not automatically do # this inline.
Yeah, this isn't the first time I've been caught assuming that GHC will inline. Thanks again for the benchmarking! Cheers, - Ben

Out of curiosity are these numbers from single runs or do you average?
Run three times and took the middle in this time.
What are the uncertainties on these numbers? Even on the Rackspace machines I was finding very large variances in my benchmarks, largely due to far outliers. I didn't investigate too far but it seems that a non-trivial fraction of connections were failing.
If cores are in sleep mode, the results are poor. You need to warm cores up somehow. I forget how to disable the deep sleep mode by a command on Linux. (Open a special file and write something?) I believe that Andi knows that. To my experience, 1G network is NOT good enough.
# And I was disappointed a bit because GHC does not automatically do # this inline.
Yeah, this isn't the first time I've been caught assuming that GHC will inline.
I read your code and you export these functions. That's why GHC does not inline them automatically. --Kazu

On Wed, Oct 22, 2014 at 12:10 AM, Kazu Yamamoto
Out of curiosity are these numbers from single runs or do you average?
Run three times and took the middle in this time.
What are the uncertainties on these numbers? Even on the Rackspace machines I was finding very large variances in my benchmarks, largely due to far outliers. I didn't investigate too far but it seems that a non-trivial fraction of connections were failing.
If cores are in sleep mode, the results are poor. You need to warm cores up somehow.
I forget how to disable the deep sleep mode by a command on Linux. (Open a special file and write something?) I believe that Andi knows that.
You need to set the CPU into C0 using /dev/cpu_dma_latency. Here's a short paper with a program to show the way to do it[1]. The Mio paper mentions this, and the results are pretty dramatic: "We disable power-saving by specifying the maximum transition latency for the CPU, which forces the CPU cores to stay in C0 state. Figure 12 shows the results, with the curves labelled “Default” and “NoSleep” showing the performance in the default configuration and the default configuration with power-saving disabled, respectively. Without limiting the CPU sleep states (curve “Default”), SimpleServerC cannot benefit from using more CPU cores and the throughput is less than 218,000 requests per second. In contrast, after preventing CPU cores entering deep sleep states (curve “NoSleep”), SimpleServerC scales up to 20 cores and can process 1.2 million requests per second, approximately 6 times faster than with the default configuration."[2]
To my experience, 1G network is NOT good enough.
The Rackspace machines come with bonded 10GigE, so hopefully over the internal DC network they can handle that. :) [1] http://en.community.dell.com/cfs-file/__key/telligent-evolution-components-a... [2] Section 5.1, http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Ben, I have some comments and questions about your code: C: registerFd' is exported. So, it should have document. Q: Since registerFd uses OneShot and threadWait uses registerFd, basic IO functions use OneShot by default. No changes from GHC 7.8.3. Do I understand correctly? Q: dbus library will use registerFd' to specify MultiShot, right? --Kazu

Kazu Yamamoto
Ben,
I have some comments and questions about your code:
C: registerFd' is exported. So, it should have document.
It is documented [1] in the 7.10 branch. I didn't bother to bring this patch over to 7.8 (although I will do so when it becomes clear that this is going in to 7.8.4).
Q: Since registerFd uses OneShot and threadWait uses registerFd, basic IO functions use OneShot by default. No changes from GHC 7.8.3. Do I understand correctly?
That is the idea. That being said adding another variant of registerFd (which as far as I know has three users) for backwards compatibility seems a bit silly. If we decided to punt on this patch until 7.10 I'd say we should just change the interface of registerFd. If we are going to put it in 7.8.4, however, then this isn't as clear.
Q: dbus library will use registerFd' to specify MultiShot, right?
I'm not sure about dbus but this is how usb will use it, yes. Does dbus also use the event manager directly? Cheers, - Ben [1] https://github.com/bgamari/ghc/commit/fb948ef1cdb92419b88fb621edee19d644a260...

Q: Since registerFd uses OneShot and threadWait uses registerFd, basic IO functions use OneShot by default. No changes from GHC 7.8.3. Do I understand correctly?
That is the idea. That being said adding another variant of registerFd (which as far as I know has three users) for backwards compatibility seems a bit silly. If we decided to punt on this patch until 7.10 I'd say we should just change the interface of registerFd. If we are going to put it in 7.8.4, however, then this isn't as clear.
Understood. Thanks.
Q: dbus library will use registerFd' to specify MultiShot, right?
I'm not sure about dbus but this is how usb will use it, yes. Does dbus also use the event manager directly?
Never mind. I meant usb, not dbus. --Kazu

Kazu Yamamoto
Hi,
Hi Kazu,
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...
I will also try this with two 20-core machines connected 10G on Monday.
I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set.
Server: witty 8080 -r -a -s +RTS -N<n> *1
Have you noticed that witty will sometimes terminate (with exit code 0) spontaneously during a run? This seems to happen more often with higher core counts. I've seen this both with and without my patch (based on master as of earlier today). Cheers, - Ben

Austin Seipp
The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit.
Hmm, that is slightly sticky. I'm not sure what Bas thinks but IMHO it's not the end of the world if usb needs to disable event manager support in the 7.8 series. Whatever happens I want to make sure this is very well tested before it is merged. I'm still recovering from the shock of this change being so painless. The reason here may be that I've only tested against Linux. It would be good if someone with a Mac could run a validation. Same with BSD. On that note, are there plans to bring up a BSD test box for harbormaster?
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...
Just so every is on the same page, I've taken and rebased the patch set on master and opened D347 to track it. Cheers, - Ben

Hi,
On 17 October 2014 16:27, Austin Seipp
The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit.
Recent versions of Cabal (1.20+) define a MIN_TOOL_VERSION macro similar to MIN_VERSION_. So you can use '#if MIN_TOOL_VERSION_ghc(7,8,4)' to detect GHC 7.8.4.

Austin Seipp
The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit.
Also, I should mention that as written the patch changes no exported interfaces. Instead of changing `registerFd` it adds an additional variant `registerFd'` which allows the user to specify a lifetime. That being said, I'm personally not terribly fond of adding these sorts of backwards compatibility variants unless really necessary. Given that this is such a low-visibility interface we may want to consider just modifying `registerFd` and avoid further polluting the namespace (this would be the third exported variant of `registerFd`). Thoughts? - Ben
participants (7)
-
Andreas Voellmy
-
Austin Seipp
-
Bas van Dijk
-
Ben Gamari
-
Herbert Valerio Riedel
-
Kazu Yamamoto
-
Mikhail Glushenkov