One-shot semantics in GHC event manager

older
Warning on tabs by default (#9230)...

Ben Gamari

11 Oct 2014 11 Oct '14

4:17 a.m.

In ba2555ef and a6f52b19 one-shot semantics were added to event manager in `base`. If my understanding of this code is correct, in this mode the event manager will use only notify the user of the first event on a registered fd after which point the fd will need to be re-registered to get another notification. It appears this was done to optimize the common case of file I/O where only a single event is needed This change lead to a regression[1] in Bas van Dijk's usb library under GHC 7.8. usb's use of the event manager requires that all events on an fd are reported until the fd is registered or else hardware events are lost. I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code. Going forward library authors now need to worry about whether the system event manager is one-shot or not. Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses. Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics? Cheers, - Ben [1] https://github.com/basvandijk/usb/issues/7

Attachments:

attachment.sig (application/pgp-signature — 472 bytes)

Show replies by date

Andreas Voellmy

11 Oct 11 Oct

4:41 a.m.

On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari wrote:

...

In ba2555ef and a6f52b19 one-shot semantics were added to event manager in `base`. If my understanding of this code is correct, in this mode the event manager will use only notify the user of the first event on a registered fd after which point the fd will need to be re-registered to get another notification.

Yes.

...

It appears this was done to optimize the common case of file I/O where only a single event is needed

Yes.

...

This change lead to a regression[1] in Bas van Dijk's usb library under GHC 7.8. usb's use of the event manager requires that all events on an fd are reported until the fd is registered or else hardware events are lost.

The change should only affect libraries using GHC.Event (or other modules underneath), which are exposed, but considered "internal". I searched hackage before making this change and usb was the only library that came up using GHC.Event directly. I'm not sure if I sent the usb maintainers an email now... I really should have done that to save you the effort of hunting down the problem in usb.

...

I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code.

It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.

...

Going forward library authors now need to worry about whether the system event manager is one-shot or not.

Yes, but only library authors using the internal interface.

...

Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.

...

Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics?

The system event manager is configured by GHC.Thread using ONE_SHOT if the system supports it. You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb? -Andi

...

Cheers,

- Ben

[1] https://github.com/basvandijk/usb/issues/7

Ben Gamari

5:07 a.m.

Thanks for your quick reply! Andreas Voellmy writes:

...

On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari wrote:

...
I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code.

It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.

What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.

...

...
Going forward library authors now need to worry about whether the system event manager is one-shot or not.

Yes, but only library authors using the internal interface.

...
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.

...
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics?

The system event manager is configured by GHC.Thread using ONE_SHOT if the system supports it.

You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb?

I had considered this but looked for other options for two reasons, * `loop` isn't exported by GHC.Event * there is already a perfectly usable event loop thread in existence I'm a bit curious to know what advantages ONE_SHOT being per-manager carries over per-fd. If the advantages are large enough then we can just export `loop` and be done with it but the design as it stands strikes me as a bit odd. Cheers, - Ben

Andreas Voellmy

1:31 p.m.

On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari wrote:

...

Thanks for your quick reply!

Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari wrote:

...
I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a fair bit of complexity to the subsystem while breaking backwards compatibility with library code.

It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.

What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.

Generally, ONE_SHOT helped improve performance. I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.

...

...
...
Going forward library authors now need to worry about whether the system event manager is one-shot or not.

Yes, but only library authors using the internal interface.

...
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.

...
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change and instead add a variant of `registerFd` which exposes one-shot semantics?

The system event manager is configured by GHC.Thread using ONE_SHOT if

the

...
system supports it.

You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb?

I had considered this but looked for other options for two reasons,

* `loop` isn't exported by GHC.Event

Right - it wouldn't make sense to export the system EventManager's loop. However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.

...

* there is already a perfectly usable event loop thread in existence

I'm a bit curious to know what advantages ONE_SHOT being per-manager carries over per-fd. If the advantages are large enough then we can just export `loop` and be done with it but the design as it stands strikes me as a bit odd.

I suspect that a per-FD design would perform just as well, but I need to look at the details to be sure. Cheers,

...

- Ben

Andreas Voellmy

2:58 p.m.

Another way to fix usb would be to re-register the callback after a previously registered callback is fired. Of course it is cheaper not to have to re-register, but re-registration in the latest IO manager should be fairly cheap, so this may not be a performance problem for usb. Would this work for you? You could also use a CPP directive to only do this for GHC 7.8 and up. If we want to allow usb to work unchanged, then we will have to revert to the non-ONE_SHOT behavior of registerFd and add some things to the API to allow GHC.Thread to register with ONE_SHOT behavior. Reverting could break clients of this semi-public API who have adapted to the 7.8 behavior. There probably aren't of these clients other than GHC.Thread, so this may not be a big issue. To do per-FD setting of ONE_SHOT or not, we actually need to have per-subscription settings, since there can be multiple invocations to register callbacks for a single file descriptor (e.g. from two different threads) and they might want different settings. If all the clients want ONE_SHOT we use ONE_SHOT registration, if all want persistent registrations we don't use ONE_SHOT. If it is mixed, then the manager has to choose one or the other and simulate the required behavior for the other registrations (e.g. choose persistent and automatically unregister for ONE_SHOT registrations). We could either always make the same choice (e.g. if there is a mix, use persistent), or we could have per-FD setting that is configurable by clients. Andi On Sat, Oct 11, 2014 at 9:31 AM, Andreas Voellmy wrote:

...

On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari wrote:

...
Thanks for your quick reply!

Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 AM, Ben Gamari wrote:

...
I'm a bit perplexed as to why the change was made in the way that it was. Making one-shot a event-manager-wide attribute seems to add a

fair

...
bit of complexity to the subsystem while breaking backwards compatibility with library code.

It added some complexity to the IO manager, but that should not affect clients except those using the internal interface.

What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.

Generally, ONE_SHOT helped improve performance. I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.

...
...
...
Going forward library authors now need to worry about whether the system event manager is one-shot or not.

Yes, but only library authors using the internal interface.

...
Not only is this platform dependent but it seems that there is no way for a user to determine which semantics the system event handler uses.

...
Is there a reason why one-shot wasn't exported as a per-fd attribute instead of per-manager? Might it be possible to back out this change

...
...
instead add a variant of `registerFd` which exposes one-shot semantics?

The system event manager is configured by GHC.Thread using ONE_SHOT if

and the

...
system supports it.

You can always create your own EventManager using GHC.Event.Manager.new or GHC.Event.Manager.newWith functions. Those functions take a Bool argument that control whether ONE_SHOT is used by the Manager returned by that function (False means not to use ONE_SHOT). Would this work for usb?

I had considered this but looked for other options for two reasons,

* `loop` isn't exported by GHC.Event

Right - it wouldn't make sense to export the system EventManager's loop. However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.

...
* there is already a perfectly usable event loop thread in existence

I'm a bit curious to know what advantages ONE_SHOT being per-manager carries over per-fd. If the advantages are large enough then we can just export `loop` and be done with it but the design as it stands strikes me as a bit odd.

I suspect that a per-FD design would perform just as well, but I need to look at the details to be sure.

Cheers,

...
- Ben

Ben Gamari

3:52 p.m.

Andreas Voellmy writes:

...

Another way to fix usb would be to re-register the callback after a previously registered callback is fired. Of course it is cheaper not to have to re-register, but re-registration in the latest IO manager should be fairly cheap, so this may not be a performance problem for usb. Would this work for you?

You could also use a CPP directive to only do this for GHC 7.8 and up.

This is a possibility that I had considered. It would require that some code be reworked however. I'm leaning towards just using our non-event-manager-driven fallback path on GHC 7.8 for now until the event manager semantics can be worked out.

...

If we want to allow usb to work unchanged, then we will have to revert to the non-ONE_SHOT behavior of registerFd and add some things to the API to allow GHC.Thread to register with ONE_SHOT behavior. Reverting could break clients of this semi-public API who have adapted to the 7.8 behavior. There probably aren't of these clients other than GHC.Thread, so this may not be a big issue.

I don't think we need to revert the changes just for usb as we have the fallback path that will work for now. I do think it might be good to explore other points in the design space, however.

...

To do per-FD setting of ONE_SHOT or not, we actually need to have per-subscription settings, since there can be multiple invocations to register callbacks for a single file descriptor (e.g. from two different threads) and they might want different settings.

Agreed.

...

If all the clients want ONE_SHOT we use ONE_SHOT registration, if all want persistent registrations we don't use ONE_SHOT. If it is mixed, then the manager has to choose one or the other and simulate the required behavior for the other registrations (e.g. choose persistent and automatically unregister for ONE_SHOT registrations). We could either always make the same choice (e.g. if there is a mix, use persistent), or we could have per-FD setting that is configurable by clients.

I would think we would actually want the desired one-shottedness to be a property of the registration. We would have, -- | Will this registration be valid until unregistration ('ManyShot') -- or only for a single event ('OneShot')? data Lifetime = OneShot | ManyShot registerFd :: EventManager -> Fd -> Event -> Lifetime -> IO FdKey The event manager would then have to choose either ONE_SHOT or not in the case of heterogenous registrations and emulate the other set, as you said. This seems like a nice interface as it allows the user to specify the semantics that they want (instead of working around whatever the manager happens to provide) and gives the the event manager enough knowledge and freedom to do what it can to efficiently implement what is needed. Cheers, - Ben

Ben Gamari

4:17 p.m.

Andreas Voellmy writes:

...

On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari wrote:

...
Thanks for your quick reply!

What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.

Generally, ONE_SHOT helped improve performance.

Sure, I would certainly believe this.

...

I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.

I've started playing around with the code to see what might be possible here. We'll see how far I get.

...

...
I had considered this but looked for other options for two reasons,

* `loop` isn't exported by GHC.Event

Right - it wouldn't make sense to export the system EventManager's loop.

The system EventManager's loop is `GHC.Event.Manager.loop`, no?

...

However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.

Right, but `GHC.Event.Manager` is not exported by `base`. Cheers, - Ben

Andreas Voellmy

5:54 p.m.

On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

...

Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 1:07 AM, Ben Gamari wrote:

...
Thanks for your quick reply!

What I'm wondering is what the extra complexity bought us. It seems like the same thing could have been achieved with less breakage by making this per-fd instead of per-manager. I may be missing something, however.

Generally, ONE_SHOT helped improve performance.

Sure, I would certainly believe this.

...
I agree with you that it may be possible to do this on a per-FD basis. I'll look into what it would take to do this.

I've started playing around with the code to see what might be possible here. We'll see how far I get.

...
...
I had considered this but looked for other options for two reasons,

* `loop` isn't exported by GHC.Event

Right - it wouldn't make sense to export the system EventManager's loop.

The system EventManager's loop is `GHC.Event.Manager.loop`, no?

Yes, but it will be invoked by GHC.Thread and any other callers of it will simply block indefinitely waiting for the thread that is running loop to give it up - which will typically never happen.

...

...
However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.

Right, but `GHC.Event.Manager` is not exported by `base`.

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

...

Cheers,

- Ben

Ben Gamari

9:39 p.m.

Andreas Voellmy writes:

...

On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Yes, but it will be invoked by GHC.Thread and any other callers of it will simply block indefinitely waiting for the thread that is running loop to give it up - which will typically never happen.

Right.

...

...
...
However, the GHC.Event.Manager module does export its loop function, so if you create your own non-ONE_SHOT event manager, you can just invoke its loop function.

Right, but `GHC.Event.Manager` is not exported by `base`.

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int). Sadly I have to run for the night and will be on a bike ride tomorrow but perhaps I can come back to it on Monday. Feel free to read it over and see if I missed something. Cheers, - Ben [1] https://github.com/bgamari/packages-base/compare/ghc:ghc-7.8...event-rework

Ben Gamari

13 Oct 13 Oct

7:05 p.m.

Ben Gamari writes:

...

Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes. What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult. Cheers, - Ben

Austin Seipp

9:33 p.m.

For the record, I talked to Ben earlier on IRC, and I can provide him with a machine to do intense benchmarks of the new I/O manager. Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1] In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.) [1] You can find the hardware specs here; GHC benchmarking is probably best suited for the "OnMetal I/O" type, which has 40 cores, 2x PCIe flash and 128GB of RAM - http://www.rackspace.com/cloud/servers/onmetal/ On Mon, Oct 13, 2014 at 2:05 PM, Ben Gamari wrote:

...

Ben Gamari writes:

...
Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes.

What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.

Cheers,

- Ben

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Herbert Valerio Riedel

14 Oct 14 Oct

10:18 a.m.

On 2014-10-13 at 23:33:13 +0200, Austin Seipp wrote: [...]

...

Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1]

In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.)

I'd like to add to this; If there's demand to provide SSH accounts to MSYS2-environments for developing/testing GHC patches or generally debugging/fixing GHC issues occuring on Windows, we may be able to provide such (ephemeral) accounts. Cheers, hvr

Andreas Voellmy

3:57 p.m.

This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime. On Mon, Oct 13, 2014 at 5:33 PM, Austin Seipp wrote:

...

For the record, I talked to Ben earlier on IRC, and I can provide him with a machine to do intense benchmarks of the new I/O manager.

Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1]

In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.)

[1] You can find the hardware specs here; GHC benchmarking is probably best suited for the "OnMetal I/O" type, which has 40 cores, 2x PCIe flash and 128GB of RAM - http://www.rackspace.com/cloud/servers/onmetal/

...
Ben Gamari writes:

...
Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime

On Mon, Oct 13, 2014 at 2:05 PM, Ben Gamari wrote: proposal

...
...
...
better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes.

What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.

Cheers,

- Ben

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards,

Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Austin Seipp

5:22 p.m.

Hey Andreas, The basic rundown is that if we equip you with an account, you can just do it yourself. Although we'd like to restrict access a bit more; I'll figure something out. Yeah, if you hop on IRC, we can chat quickly about it and work something out in the mean time. On Tue, Oct 14, 2014 at 10:57 AM, Andreas Voellmy wrote:

...

This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime.

On Mon, Oct 13, 2014 at 5:33 PM, Austin Seipp wrote:

...
For the record, I talked to Ben earlier on IRC, and I can provide him with a machine to do intense benchmarks of the new I/O manager.

Also, if any other developers (like Andreas, Johan, Bryan, etc) in this space want a big machine to test it on, I can probably equip you with one (or several). Since Rackspace is so gracious to us, we can immediately allocate high-powered, physical (i.e. not Xen, but real machines) machines to do high-scale testing on.[1]

In any case, it's not hard to do and only takes a few minutes, so Ben: let me know. (I've thought it would be neat to implement a leasing system somehow, where a developer could lease a few machines for a short period of time, at which point they expire and a background job cleans them up.)

[1] You can find the hardware specs here; GHC benchmarking is probably best suited for the "OnMetal I/O" type, which has 40 cores, 2x PCIe flash and 128GB of RAM - http://www.rackspace.com/cloud/servers/onmetal/

On Mon, Oct 13, 2014 at 2:05 PM, Ben Gamari wrote:

...
Ben Gamari writes:

...
Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes.

What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.

Cheers,

- Ben

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards,

Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Ben Gamari

5:23 p.m.

Andreas Voellmy writes:

...

This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime.

Do you suppose you could document the process a bit as you do this? I've been having a bit of trouble reproducing your numbers with GHC 7.8 on the hardware that I have access to. Cheers, - Ben

Andreas Voellmy

5:29 p.m.

Yes, I'll try to describe it and script it so that others can understand the benchmarks and run it easily as well. On Tue, Oct 14, 2014 at 1:23 PM, Ben Gamari wrote:

...

Andreas Voellmy writes:

...
This is awesome. I'd like to try to recreate some of the evaluations for the multicore IO manager paper on that 40 core system at backspace. How can I get access to this? I'll jump on IRC - maybe it is easier to chat in realtime.

Do you suppose you could document the process a bit as you do this? I've been having a bit of trouble reproducing your numbers with GHC 7.8 on the hardware that I have access to.

Cheers,

- Ben

Bas van Dijk

17 Oct 17 Oct

12:34 p.m.

Hi Ben, Austin, Is there any chance of Ben's event manager patch landing in GHC-7.8.4? Bas On 13 October 2014 21:05, Ben Gamari wrote:

...

Ben Gamari writes:

...
Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes.

What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.

Cheers,

- Ben

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Andreas Voellmy

12:59 p.m.

I haven't had a chance to dig into Ben's patch yet, but I expect it will accepted soon - I don't think the change will affect performance. Austin, would it be possible to get a relatively minor change to the event manager into 7.8.4? It may change a semi-public API under GHC.Event, but will not change any public API. What is our time window? Andi On Fri, Oct 17, 2014 at 8:34 AM, Bas van Dijk wrote:

...

Hi Ben, Austin,

Is there any chance of Ben's event manager patch landing in GHC-7.8.4?

Bas

...
Ben Gamari writes:

...
Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime

On 13 October 2014 21:05, Ben Gamari wrote: proposal

...
...
...
better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes.

What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.

Cheers,

- Ben

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Austin Seipp

2:27 p.m.

The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit. Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment... On Fri, Oct 17, 2014 at 7:59 AM, Andreas Voellmy wrote:

...

I haven't had a chance to dig into Ben's patch yet, but I expect it will accepted soon - I don't think the change will affect performance.

Austin, would it be possible to get a relatively minor change to the event manager into 7.8.4? It may change a semi-public API under GHC.Event, but will not change any public API. What is our time window?

Andi

On Fri, Oct 17, 2014 at 8:34 AM, Bas van Dijk wrote:

...
Hi Ben, Austin,

Is there any chance of Ben's event manager patch landing in GHC-7.8.4?

Bas

On 13 October 2014 21:05, Ben Gamari wrote:

...
Ben Gamari writes:

...
Andreas Voellmy writes:

...
On Sat, Oct 11, 2014 at 12:17 PM, Ben Gamari wrote:

Ah... so this is not useful to you. I guess we could add `loop` to GHC.Event's export list. On the other hand, I like your LifeTime proposal better and then no one needs `loop`, so let's try this first.

I have a first cut of this here [1]. It compiles but would be I shocked if it ran. All of the pieces are there but I need to change EventLifetime to a more efficient encoding (there's no reason why it needs to be more than an Int).

As it turns out the patch seems to get through the testsuite after a few minor fixes.

What other tests can I subject this to? I'm afraid I don't have the access to any machine even close to the size of those that the original event manager was tested on so checking for performance regressions will be difficult.

Cheers,

- Ben

_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

-- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Kazu Yamamoto

2:51 p.m.

Austin,

...

Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...

I will also try this with two 20-core machines connected 10G on Monday. --Kazu

Kazu Yamamoto

21 Oct 21 Oct

4:15 a.m.

Hi,

...

...
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...

I will also try this with two 20-core machines connected 10G on Monday.

I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set. Server: witty 8080 -r -a -s +RTS -N<n> *1 Measurement tool: weighttp -n 100000 -c 1000 -k -t 19 http://192.168.0.1:8080/ Measurement env: two 20 core (w/o HT) machines directly connected 10G Here is result (req/s): -N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 37,608 69,376 131,686 237,783 333,946 head and 7.8.3 has almost the same performance. But I saw significant performance regression in Ben's patch set. *1 https://github.com/kazu-yamamoto/witty/blob/master/README.md P.S. - Scalability is not linear as you can see. - prefork (witty -n <n>) got much better result than Mio (witty +RTS <n>) (677,837 req/s for witty 8080 -r -a -s -n 16) --Kazu

Ben Gamari

3:31 p.m.

Kazu Yamamoto writes:

...

Hi,

...
...
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...

I will also try this with two 20-core machines connected 10G on Monday.

I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set.

Server: witty 8080 -r -a -s +RTS -N<n> *1 Measurement tool: weighttp -n 100000 -c 1000 -k -t 19 http://192.168.0.1:8080/ Measurement env: two 20 core (w/o HT) machines directly connected 10G

Here is result (req/s):

-N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 37,608 69,376 131,686 237,783 333,946

head and 7.8.3 has almost the same performance. But I saw significant performance regression in Ben's patch set.

Hmm, uh oh. Thanks for testing this. I'll try to reproduce this on my end. It looks like it shouldn't be so hard as even the single-threaded performance regresses drastically. Just to confirm, you are using the latest revision of D347? Cheers, - Ben

Kazu Yamamoto

22 Oct 22 Oct

2:52 a.m.

Ben,

...

Hmm, uh oh. Thanks for testing this. I'll try to reproduce this on my end. It looks like it shouldn't be so hard as even the single-threaded performance regresses drastically. Just to confirm, you are using the latest revision of D347?

I used the following as you suggested: https://github.com/bgamari/packages-base/compare/ghc:ghc-7.8...event-rework I cannot tell whether or not this is idential to D347. --Kazu

Ben Gamari

21 Oct 21 Oct

10:21 p.m.

Kazu Yamamoto writes:

...

Hi,

I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set.

Server: witty 8080 -r -a -s +RTS -N<n> *1 Measurement tool: weighttp -n 100000 -c 1000 -k -t 19 http://192.168.0.1:8080/ Measurement env: two 20 core (w/o HT) machines directly connected 10G

Here is result (req/s):

-N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 37,608 69,376 131,686 237,783 333,946

head and 7.8.3 has almost the same performance. But I saw significant performance regression in Ben's patch set.

This may be due to lacking INLINEs on definitions added in GHC.Event.Internal [1]. I'm currently in the middle of reproducing these results on an EC2 instance to confirm this. So far the results look much more consistent than my previous attempts at benchmarking on my own hardware. Cheers, - Ben [1] https://github.com/bgamari/ghc/tree/event-rework-7.10

Kazu Yamamoto

22 Oct 22 Oct

3:16 a.m.

Ben,

...

This may be due to lacking INLINEs on definitions added in GHC.Event.Internal [1]. I'm currently in the middle of reproducing these results on an EC2 instance to confirm this. So far the results look much more consistent than my previous attempts at benchmarking on my own hardware.

If using https://github.com/bgamari/packages-base/commits/event-rework is a right way, please push the INLINE commit to this repo? I will try it gain. --Kazu

Ben Gamari

3:26 a.m.

Kazu Yamamoto writes:

...

Ben,

...
This may be due to lacking INLINEs on definitions added in GHC.Event.Internal [1]. I'm currently in the middle of reproducing these results on an EC2 instance to confirm this. So far the results look much more consistent than my previous attempts at benchmarking on my own hardware.

If using https://github.com/bgamari/packages-base/commits/event-rework is a right way, please push the INLINE commit to this repo? I will try it gain.

I already pushed it. The commit in question is 5dce47eb8415eb31e1c6759b6f6a2ef5bfe32470. Thanks for the benchmarking! Cheers, - Ben

Kazu Yamamoto

3:37 a.m.

...

I already pushed it. The commit in question is 5dce47eb8415eb31e1c6759b6f6a2ef5bfe32470. Thanks for the benchmarking!

I believe this is in bgamari/ghc (for GHC 7.10?). I'm using bgamari/packages-base for GHC 7.8 and asking to push the same commit to this repo. Actually I compared the latest Internal.hs in bgamari/ghc and one in bgamari/packages-base and I saw *additional* differences. So, I hesitated to apply the patch by myself. --Kazu

Ben Gamari

3:51 a.m.

Kazu Yamamoto writes:

...

...
I already pushed it. The commit in question is 5dce47eb8415eb31e1c6759b6f6a2ef5bfe32470. Thanks for the benchmarking!

I believe this is in bgamari/ghc (for GHC 7.10?). I'm using bgamari/packages-base for GHC 7.8 and asking to push the same commit to this repo.

Ahh, yes. Sorry, I forgot you were on 7.8. Just pushed a new patch to the event-rework-squashed branch [1].

...

Actually I compared the latest Internal.hs in bgamari/ghc and one in bgamari/packages-base and I saw *additional* differences. So, I hesitated to apply the patch by myself.

There were a few changes necessary due to AMP fallout. They are mostly harmless. Cheers, - Ben [1] https://github.com/bgamari/packages-base/commit/01ac6692f04378052ff7ad844409...

Kazu Yamamoto

3:55 a.m.

...

Ahh, yes. Sorry, I forgot you were on 7.8. Just pushed a new patch to the event-rework-squashed branch [1].

I believe that you are trying to merge your patches to GHC 7.8.4? If not, I will work on the GHC head branch. --Kazu

Ben Gamari

4:05 a.m.

Kazu Yamamoto writes:

...

...
Ahh, yes. Sorry, I forgot you were on 7.8. Just pushed a new patch to the event-rework-squashed branch [1].

I believe that you are trying to merge your patches to GHC 7.8.4? If not, I will work on the GHC head branch.

Well, Bas was wondering whether this would be possible. At this point I'm a bit on the fence; on one hand it's not a crucial fix (we have a workaround in usb) and it may involve changes to exported interfaces (although not very high visibility). On the other hand, it's a pretty easy change to make and it cleans up the semantics of the event manager nicely. Frankly I doubt that the performance characteristics of the patch will change much between HEAD and ghc-7.8 (up to the difference that you've already reported in your last set of benchmarks). Cheers, - Ben

Kazu Yamamoto

4:52 a.m.

...

Well, Bas was wondering whether this would be possible. At this point I'm a bit on the fence; on one hand it's not a crucial fix (we have a workaround in usb) and it may involve changes to exported interfaces (although not very high visibility). On the other hand, it's a pretty easy change to make and it cleans up the semantics of the event manager nicely.

I benchmarked your patches on GHC 7.8 branch: -N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 84,472 140,978 291,550 488,834 523,837 The inline patch works very nice. :-) # And I was disappointed a bit because GHC does not automatically do # this inline. --Kazu

Ben Gamari

5:02 a.m.

Kazu Yamamoto writes:

...

...
Well, Bas was wondering whether this would be possible. At this point I'm a bit on the fence; on one hand it's not a crucial fix (we have a workaround in usb) and it may involve changes to exported interfaces (although not very high visibility). On the other hand, it's a pretty easy change to make and it cleans up the semantics of the event manager nicely.

I benchmarked your patches on GHC 7.8 branch:

-N<n> 1 2 4 8 16 --------------------------------------------------------- head 92,855 155,957 306,813 498,613 527,034 7.8.3 86,494 160,321 310,675 494,020 510,751 7.8.3+ben 84,472 140,978 291,550 488,834 523,837

The inline patch works very nice. :-)

Awesome! Out of curiosity are these numbers from single runs or do you average? What are the uncertainties on these numbers? Even on the Rackspace machines I was finding very large variances in my benchmarks, largely due to far outliers. I didn't investigate too far but it seems that a non-trivial fraction of connections were failing. At some point it would be nice to chat about how we might replicate your benchmarking configuration on the Rackspace boxen.

...

# And I was disappointed a bit because GHC does not automatically do # this inline.

Yeah, this isn't the first time I've been caught assuming that GHC will inline. Thanks again for the benchmarking! Cheers, - Ben

Kazu Yamamoto

5:10 a.m.

...

Out of curiosity are these numbers from single runs or do you average?

Run three times and took the middle in this time.

...

What are the uncertainties on these numbers? Even on the Rackspace machines I was finding very large variances in my benchmarks, largely due to far outliers. I didn't investigate too far but it seems that a non-trivial fraction of connections were failing.

If cores are in sleep mode, the results are poor. You need to warm cores up somehow. I forget how to disable the deep sleep mode by a command on Linux. (Open a special file and write something?) I believe that Andi knows that. To my experience, 1G network is NOT good enough.

...

...
# And I was disappointed a bit because GHC does not automatically do # this inline.

Yeah, this isn't the first time I've been caught assuming that GHC will inline.

I read your code and you export these functions. That's why GHC does not inline them automatically. --Kazu

Austin Seipp

5:21 a.m.

On Wed, Oct 22, 2014 at 12:10 AM, Kazu Yamamoto wrote:

...

...
Out of curiosity are these numbers from single runs or do you average?

Run three times and took the middle in this time.

...
What are the uncertainties on these numbers? Even on the Rackspace machines I was finding very large variances in my benchmarks, largely due to far outliers. I didn't investigate too far but it seems that a non-trivial fraction of connections were failing.

If cores are in sleep mode, the results are poor. You need to warm cores up somehow.

I forget how to disable the deep sleep mode by a command on Linux. (Open a special file and write something?) I believe that Andi knows that.

You need to set the CPU into C0 using /dev/cpu_dma_latency. Here's a short paper with a program to show the way to do it[1]. The Mio paper mentions this, and the results are pretty dramatic: "We disable power-saving by specifying the maximum transition latency for the CPU, which forces the CPU cores to stay in C0 state. Figure 12 shows the results, with the curves labelled “Default” and “NoSleep” showing the performance in the default configuration and the default configuration with power-saving disabled, respectively. Without limiting the CPU sleep states (curve “Default”), SimpleServerC cannot benefit from using more CPU cores and the throughput is less than 218,000 requests per second. In contrast, after preventing CPU cores entering deep sleep states (curve “NoSleep”), SimpleServerC scales up to 20 cores and can process 1.2 million requests per second, approximately 6 times faster than with the default configuration."[2]

...

To my experience, 1G network is NOT good enough.

The Rackspace machines come with bonded 10GigE, so hopefully over the internal DC network they can handle that. :) [1] http://en.community.dell.com/cfs-file/__key/telligent-evolution-components-a... [2] Section 5.1, http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf -- Regards, Austin Seipp, Haskell Consultant Well-Typed LLP, http://www.well-typed.com/

Kazu Yamamoto

5:38 a.m.

Hi Austin,

...

You need to set the CPU into C0 using /dev/cpu_dma_latency. Here's a short paper with a program to show the way to do it[1].

This paper is what I'm looking for. Thanks! --Kazu

Kazu Yamamoto

5:15 a.m.

Ben, I have some comments and questions about your code: C: registerFd' is exported. So, it should have document. Q: Since registerFd uses OneShot and threadWait uses registerFd, basic IO functions use OneShot by default. No changes from GHC 7.8.3. Do I understand correctly? Q: dbus library will use registerFd' to specify MultiShot, right? --Kazu

Ben Gamari

5:45 a.m.

Kazu Yamamoto writes:

...

Ben,

I have some comments and questions about your code:

C: registerFd' is exported. So, it should have document.

It is documented [1] in the 7.10 branch. I didn't bother to bring this patch over to 7.8 (although I will do so when it becomes clear that this is going in to 7.8.4).

...

Q: Since registerFd uses OneShot and threadWait uses registerFd, basic IO functions use OneShot by default. No changes from GHC 7.8.3. Do I understand correctly?

That is the idea. That being said adding another variant of registerFd (which as far as I know has three users) for backwards compatibility seems a bit silly. If we decided to punt on this patch until 7.10 I'd say we should just change the interface of registerFd. If we are going to put it in 7.8.4, however, then this isn't as clear.

...

Q: dbus library will use registerFd' to specify MultiShot, right?

I'm not sure about dbus but this is how usb will use it, yes. Does dbus also use the event manager directly? Cheers, - Ben [1] https://github.com/bgamari/ghc/commit/fb948ef1cdb92419b88fb621edee19d644a260...

Kazu Yamamoto

6:47 a.m.

...

...
Q: Since registerFd uses OneShot and threadWait uses registerFd, basic IO functions use OneShot by default. No changes from GHC 7.8.3. Do I understand correctly?

That is the idea. That being said adding another variant of registerFd (which as far as I know has three users) for backwards compatibility seems a bit silly. If we decided to punt on this patch until 7.10 I'd say we should just change the interface of registerFd. If we are going to put it in 7.8.4, however, then this isn't as clear.

Understood. Thanks.

...

...
Q: dbus library will use registerFd' to specify MultiShot, right?

I'm not sure about dbus but this is how usb will use it, yes. Does dbus also use the event manager directly?

Never mind. I meant usb, not dbus. --Kazu

Ben Gamari

21 Oct 21 Oct

11:22 p.m.

Kazu Yamamoto writes:

...

Hi,

Hi Kazu,

...

...
...
Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...

I will also try this with two 20-core machines connected 10G on Monday.

I measured the performace of GHC head, 7.8.3 and 7.8.3 + Ben's patch set.

Server: witty 8080 -r -a -s +RTS -N<n> *1

Have you noticed that witty will sometimes terminate (with exit code 0) spontaneously during a run? This seems to happen more often with higher core counts. I've seen this both with and without my patch (based on master as of earlier today). Cheers, - Ben

Ben Gamari

17 Oct 17 Oct

3:14 p.m.

Austin Seipp writes:

...

The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit.

Hmm, that is slightly sticky. I'm not sure what Bas thinks but IMHO it's not the end of the world if usb needs to disable event manager support in the 7.8 series. Whatever happens I want to make sure this is very well tested before it is merged. I'm still recovering from the shock of this change being so painless. The reason here may be that I've only tested against Linux. It would be good if someone with a Mac could run a validation. Same with BSD. On that note, are there plans to bring up a BSD test box for harbormaster?

...

Andreas - want me to go ahead and get you some hardware to test Ben's patch in the mean time? This way we'll at least not leave it hanging until the last moment...

Just so every is on the same page, I've taken and rebased the patch set on master and opened D347 to track it. Cheers, - Ben

Mikhail Glushenkov

3:42 p.m.

Hi, On 17 October 2014 16:27, Austin Seipp wrote:

...

The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit.

Recent versions of Cabal (1.20+) define a MIN_TOOL_VERSION macro similar to MIN_VERSION_. So you can use '#if MIN_TOOL_VERSION_ghc(7,8,4)' to detect GHC 7.8.4.

Ben Gamari

7:41 p.m.

Austin Seipp writes:

...

The catch with such a change is that there is no macro to determine whether we're using 7.8.3 or 7.8.4, so it's harder for users to figure things out (they have to use `MIN_VERSION_base` from Cabal). But maybe that doesn'tm atter too much. So, yes, I think it's doable, but that's a sticky bit.

Also, I should mention that as written the patch changes no exported interfaces. Instead of changing `registerFd` it adds an additional variant `registerFd'` which allows the user to specify a lifetime. That being said, I'm personally not terribly fond of adding these sorts of backwards compatibility variants unless really necessary. Given that this is such a low-visibility interface we may want to consider just modifying `registerFd` and avoid further polluting the namespace (this would be the third exported variant of `registerFd`). Thoughts? - Ben

3916

Age (days ago)

3927

Last active (days ago)

List overview

Download

41 comments

7 participants

participants (7)

Andreas Voellmy
Austin Seipp
Bas van Dijk
Ben Gamari
Herbert Valerio Riedel
Kazu Yamamoto
Mikhail Glushenkov