Thread behavior in 7.8.3

Michael Jones

29 Oct 2014 29 Oct '14

11:48 a.m.

I have a general question about thread behavior in 7.8.3 vs 7.6.X I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination. When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so. When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4. The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak. I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running. One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler. What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help? I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get. Any hints appreciated. Mike

Show replies by date

Ben Gamari

29 Oct 29 Oct

9:54 p.m.

Michael Jones writes:

...

I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

Are you using Bas van Dijk's `usb` library by any chance? If so, you should be aware of this [1] issue.

...

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

Have you looked at the RTS's output when run with `+RTS -sstderr`? Is productivity any different between the two tests?

...

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop.

Oh dear, this doesn't sound good at all. Have you tried getting a backtrace out of gdb? Usually this isn't terribly useful but in this case since the event log is involved it might be getting stuck in the RTS which should give a useful backtrace. If not, perhaps strace will give some clues as to what is happening (you'll probably want to hide SIGVTALM to improve signal/noise)? Cheers, - Ben [1] https://github.com/basvandijk/usb/issues/7

Michael Jones

10:08 p.m.

Ben, I am using Bas van Dijk’s usb, and I am past the -threading issue by using the latest commit. I don’t have any easy way of making comparisons between 7.6 and 7.8 productivity, but from oscilloscope activity, I can’t see any difference. The only difference I see is the thread scheduling on 7.8 for -N1 vs -N2/4. If —sstderr gives some notion of productivity, I’ll have to do an experiment between -N1 and -N2/4. Unchartered territory for me. I’ll setup and experiment tonight. I am not familiar with strace. I’ll fix that soon. Mike On Oct 29, 2014, at 10:24 AM, Ben Gamari wrote:

...

Michael Jones writes:

...
I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

Are you using Bas van Dijk's `usb` library by any chance? If so, you should be aware of this [1] issue.

...
When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

Have you looked at the RTS's output when run with `+RTS -sstderr`? Is productivity any different between the two tests?

...
I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop.

Oh dear, this doesn't sound good at all. Have you tried getting a backtrace out of gdb? Usually this isn't terribly useful but in this case since the event log is involved it might be getting stuck in the RTS which should give a useful backtrace. If not, perhaps strace will give some clues as to what is happening (you'll probably want to hide SIGVTALM to improve signal/noise)?

Cheers,

- Ben

[1] https://github.com/basvandijk/usb/issues/7

Ben Gamari

30 Oct 30 Oct

3:27 a.m.

Michael Jones writes:

...

Ben,

I am using Bas van Dijk’s usb, and I am past the -threading issue by using the latest commit.

Excellent; I hadn't noticed the "proclivis" in your email address

...

I don’t have any easy way of making comparisons between 7.6 and 7.8 productivity, but from oscilloscope activity, I can’t see any difference. The only difference I see is the thread scheduling on 7.8 for -N1 vs -N2/4.

If —sstderr gives some notion of productivity, I’ll have to do an experiment between -N1 and -N2/4. Unchartered territory for me. I’ll setup and experiment tonight.

Indeed it does; "productivity" in this context refers to the fraction of runtime spent in evaluation (as opposed to in the garbage collector, for instance).

...

I am not familiar with strace. I’ll fix that soon.

It's often an invaluable tool; that being said it remains to seen whether it yields anything useful in this particular case. Cheers, - Ben

John Lato

4:42 a.m.

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler. On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

...

I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Michael Jones

5:32 a.m.

John, Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4. Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck. Mike On Oct 29, 2014, at 5:12 PM, John Lato wrote:

...

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote: I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

John Lato

5:49 a.m.

I guess I should explain what that flag does... The GHC RTS maintains capabilities, the number of capabilities is specified by the `+RTS -N` option. Each capability is a virtual machine that executes Haskell code, and maintains its own runqueue of threads to process. A capability will perform a context switch at the next heap block allocation (every 4k of allocation) after the timer expires. The timer defaults to 20ms, and can be set by the -C flag. Capabilities perform context switches in other circumstances as well, such as when a thread yields or blocks. My guess is that either the context switching logic changed in ghc-7.8, or possibly your code used to trigger a switch via some other mechanism (stack overflow or something maybe?), but is optimized differently now so instead it needs to wait for the timer to expire. The problem we had was that a time-sensitive thread was getting scheduled on the same capability as a long-running non-yielding thread, so the time-sensitive thread had to wait for a context switch timeout (even though there were free cores available!). I expect even with -N4 you'll still see occasional delays (perhaps <5% of calls). We've solved our problem with judicious use of `forkOn`, but that won't help at N1. We did see this behavior in 7.6, but it's definitely worse in 7.8. Incidentally, has there been any interest in a work-stealing scheduler? There was a discussion from about 2 years ago, in which Simon Marlow noted it might be tricky, but it would definitely help in situations like this. John L. On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones wrote:

...

John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

...
I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Edward Z. Yang

5:54 a.m.

I don't think this is directly related to the problem, but if you have a thread that isn't yielding, you can force it to yield by using -fno-omit-yields on your code. It won't help if the non-yielding code is in a library, and it won't help if the problem was that you just weren't setting timeouts finely enough (which sounds like what was happening). FYI. Edward Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700:

...

I guess I should explain what that flag does...

The GHC RTS maintains capabilities, the number of capabilities is specified by the `+RTS -N` option. Each capability is a virtual machine that executes Haskell code, and maintains its own runqueue of threads to process.

A capability will perform a context switch at the next heap block allocation (every 4k of allocation) after the timer expires. The timer defaults to 20ms, and can be set by the -C flag. Capabilities perform context switches in other circumstances as well, such as when a thread yields or blocks.

My guess is that either the context switching logic changed in ghc-7.8, or possibly your code used to trigger a switch via some other mechanism (stack overflow or something maybe?), but is optimized differently now so instead it needs to wait for the timer to expire.

The problem we had was that a time-sensitive thread was getting scheduled on the same capability as a long-running non-yielding thread, so the time-sensitive thread had to wait for a context switch timeout (even though there were free cores available!). I expect even with -N4 you'll still see occasional delays (perhaps <5% of calls).

We've solved our problem with judicious use of `forkOn`, but that won't help at N1.

We did see this behavior in 7.6, but it's definitely worse in 7.8.

Incidentally, has there been any interest in a work-stealing scheduler? There was a discussion from about 2 years ago, in which Simon Marlow noted it might be tricky, but it would definitely help in situations like this.

John L.

On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

...
I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

John Lato

6:01 a.m.

My understanding is that -fno-omit-yields is subtly different. I think that's for the case when a function loops without performing any heap allocations, and thus would never yield even after the context switch timeout. In my case the looping function does perform heap allocations and does eventually yield, just not until after the timeout. Is that understanding correct? (technically, doesn't it change to yielding after stack checks or something like that?) On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang wrote:

...

I don't think this is directly related to the problem, but if you have a thread that isn't yielding, you can force it to yield by using -fno-omit-yields on your code. It won't help if the non-yielding code is in a library, and it won't help if the problem was that you just weren't setting timeouts finely enough (which sounds like what was happening). FYI.

Edward

...
I guess I should explain what that flag does...

The GHC RTS maintains capabilities, the number of capabilities is specified by the `+RTS -N` option. Each capability is a virtual machine that executes Haskell code, and maintains its own runqueue of threads to

Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700: process.

...
A capability will perform a context switch at the next heap block allocation (every 4k of allocation) after the timer expires. The timer defaults to 20ms, and can be set by the -C flag. Capabilities perform context switches in other circumstances as well, such as when a thread yields or blocks.

My guess is that either the context switching logic changed in ghc-7.8,

...
possibly your code used to trigger a switch via some other mechanism (stack overflow or something maybe?), but is optimized differently now so instead it needs to wait for the timer to expire.

The problem we had was that a time-sensitive thread was getting scheduled on the same capability as a long-running non-yielding thread, so the time-sensitive thread had to wait for a context switch timeout (even

...
there were free cores available!). I expect even with -N4 you'll still see occasional delays (perhaps <5% of calls).

We've solved our problem with judicious use of `forkOn`, but that won't help at N1.

We did see this behavior in 7.6, but it's definitely worse in 7.8.

Incidentally, has there been any interest in a work-stealing scheduler? There was a discussion from about 2 years ago, in which Simon Marlow noted it might be tricky, but it would definitely help in situations like this.

John L.

On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

...
I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with

...
...
...
but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around

...
...
...
compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere

or though threadscope, the that

...
...
...
might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Edward Z. Yang

6:11 a.m.

Yes, that's right. I brought it up because you mentioned that there might still be occasional delays, and those might be caused by a thread not being preemptible for a while. Edward Excerpts from John Lato's message of 2014-10-29 17:31:45 -0700:

...

My understanding is that -fno-omit-yields is subtly different. I think that's for the case when a function loops without performing any heap allocations, and thus would never yield even after the context switch timeout. In my case the looping function does perform heap allocations and does eventually yield, just not until after the timeout.

Is that understanding correct?

(technically, doesn't it change to yielding after stack checks or something like that?)

On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang wrote:

...
I don't think this is directly related to the problem, but if you have a thread that isn't yielding, you can force it to yield by using -fno-omit-yields on your code. It won't help if the non-yielding code is in a library, and it won't help if the problem was that you just weren't setting timeouts finely enough (which sounds like what was happening). FYI.

Edward

...
I guess I should explain what that flag does...

The GHC RTS maintains capabilities, the number of capabilities is specified by the `+RTS -N` option. Each capability is a virtual machine that executes Haskell code, and maintains its own runqueue of threads to

Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700: process.

...
A capability will perform a context switch at the next heap block allocation (every 4k of allocation) after the timer expires. The timer defaults to 20ms, and can be set by the -C flag. Capabilities perform context switches in other circumstances as well, such as when a thread yields or blocks.

My guess is that either the context switching logic changed in ghc-7.8,

...
possibly your code used to trigger a switch via some other mechanism (stack overflow or something maybe?), but is optimized differently now so instead it needs to wait for the timer to expire.

The problem we had was that a time-sensitive thread was getting scheduled on the same capability as a long-running non-yielding thread, so the time-sensitive thread had to wait for a context switch timeout (even

...
there were free cores available!). I expect even with -N4 you'll still see occasional delays (perhaps <5% of calls).

We've solved our problem with judicious use of `forkOn`, but that won't help at N1.

We did see this behavior in 7.6, but it's definitely worse in 7.8.

Incidentally, has there been any interest in a work-stealing scheduler? There was a discussion from about 2 years ago, in which Simon Marlow noted it might be tricky, but it would definitely help in situations like this.

John L.

On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

...
I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with

...
...
...
but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around

...
...
...
compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere

or though threadscope, the that

...
...
...
might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Simon Peyton Jones

1:49 p.m.

I wonder if the knowledge embodied in this thread might usefully be summarised in the user manual? Or on the GHC section of the Haskell wiki https://www.haskell.org/haskellwiki/GHC? Simon | -----Original Message----- | From: Glasgow-haskell-users [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of Edward Z. Yang | Sent: 30 October 2014 00:41 | To: John Lato | Cc: GHC Users List | Subject: Re: Thread behavior in 7.8.3 | | Yes, that's right. | | I brought it up because you mentioned that there might still be | occasional delays, and those might be caused by a thread not being | preemptible for a while. | | Edward | | Excerpts from John Lato's message of 2014-10-29 17:31:45 -0700: | > My understanding is that -fno-omit-yields is subtly different. I | > think that's for the case when a function loops without performing | any | > heap allocations, and thus would never yield even after the context | > switch timeout. In my case the looping function does perform heap | > allocations and does eventually yield, just not until after the | timeout. | > | > Is that understanding correct? | > | > (technically, doesn't it change to yielding after stack checks or | > something like that?) | > | > On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang | wrote: | > | > > I don't think this is directly related to the problem, but if you | > > have a thread that isn't yielding, you can force it to yield by | > > using -fno-omit-yields on your code. It won't help if the | > > non-yielding code is in a library, and it won't help if the | problem | > > was that you just weren't setting timeouts finely enough (which | > > sounds like what was happening). FYI. | > > | > > Edward | > > | > > Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700: | > > > I guess I should explain what that flag does... | > > > | > > > The GHC RTS maintains capabilities, the number of capabilities | is | > > specified | > > > by the `+RTS -N` option. Each capability is a virtual machine | > > > that executes Haskell code, and maintains its own runqueue of | > > > threads to | > > process. | > > > | > > > A capability will perform a context switch at the next heap | block | > > > allocation (every 4k of allocation) after the timer expires. | The | > > > timer defaults to 20ms, and can be set by the -C flag. | > > > Capabilities perform context switches in other circumstances as | > > > well, such as when a thread yields or blocks. | > > > | > > > My guess is that either the context switching logic changed in | > > > ghc-7.8, | > > or | > > > possibly your code used to trigger a switch via some other | > > > mechanism | > > (stack | > > > overflow or something maybe?), but is optimized differently now | so | > > instead | > > > it needs to wait for the timer to expire. | > > > | > > > The problem we had was that a time-sensitive thread was getting | > > > scheduled on the same capability as a long-running non-yielding | > > > thread, so the time-sensitive thread had to wait for a context | > > > switch timeout (even | > > though | > > > there were free cores available!). I expect even with -N4 | you'll | > > > still | > > see | > > > occasional delays (perhaps <5% of calls). | > > > | > > > We've solved our problem with judicious use of `forkOn`, but | that | > > > won't help at N1. | > > > | > > > We did see this behavior in 7.6, but it's definitely worse in | 7.8. | > > > | > > > Incidentally, has there been any interest in a work-stealing | scheduler? | > > > There was a discussion from about 2 years ago, in which Simon | > > > Marlow | > > noted | > > > it might be tricky, but it would definitely help in situations | like this. | > > > | > > > John L. | > > > | > > > On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones | > > > | > > wrote: | > > > | > > > > John, | > > > > | > > > > Adding -C0.005 makes it much better. Using -C0.001 makes it | > > > > behave more like -N4. | > > > > | > > > > Thanks. This saves my project, as I need to deploy on a single | > > > > core | > > Atom | > > > > and was stuck. | > > > > | > > > > Mike | > > > > | > > > > On Oct 29, 2014, at 5:12 PM, John Lato | wrote: | > > > > | > > > > By any chance do the delays get shorter if you run your | program | > > > > with | > > `+RTS | > > > > -C0.005` ? If so, I suspect you're having a problem very | > > > > similar to | > > one | > > > > that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 | for | > > > > some reason), involving possible misbehavior of the thread | scheduler. | > > > > | > > > > On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones | > > > > | > > wrote: | > > > > | > > > >> I have a general question about thread behavior in 7.8.3 vs | > > > >> 7.6.X | > > > >> | > > > >> I moved from 7.6 to 7.8 and my application behaves very | > > > >> differently. I have three threads, an application thread that | > > > >> plots data with | > > wxhaskell or | > > > >> sends it over a network (depends on settings), a thread doing | > > > >> usb bulk writes, and a thread doing usb bulk reads. Data is | > > > >> moved around with | > > TChan, | > > > >> and TVar is used for coordination. | > > > >> | > > > >> When the application was compiled with 7.6, my stream of usb | > > > >> traffic | > > was | > > > >> smooth. With 7.8, there are lots of delays where nothing | seems | > > > >> to be running. These delays are up to 40ms, whereas with 7.6 | > > > >> delays were a | > > 1ms or | > > > >> so. | > > > >> | > > > >> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 | it | > > > >> runs | > > fine | > > > >> without with -N2/4. | > > > >> | > > > >> The program is compiled -O2 with profiling. The -N2/4 version | > > > >> uses | > > more | > > > >> memory, but in both cases with 7.8 and with 7.6 there is no | > > > >> space | > > leak. | > > > >> | > > > >> I tired to compile and use -ls so I could take a look with | > > threadscope, | > > > >> but the application hangs and writes no data to the file. The | > > > >> CPU | > > fans run | > > > >> wild like it is in an infinite loop. It at least pops an | > > > >> unpainted wxhaskell window, so it got partially running. | > > > >> | > > > >> One of my libraries uses option -fsimpl-tick-factor=200 to | get | > > > >> around | > > the | > > > >> compiler. | > > > >> | > > > >> What do I need to know about changes to threading and event | > > > >> logging between 7.6 and 7.8? Is there some general | > > > >> documentation somewhere | > > that | > > > >> might help? | > > > >> | > > > >> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar | > > > >> ball and installed myself, after removing 7.6 with apt-get. | > > > >> | > > > >> Any hints appreciated. | > > > >> | > > > >> Mike | > > > >> | > > > >> | > > > >> _______________________________________________ | > > > >> Glasgow-haskell-users mailing list | > > > >> Glasgow-haskell-users@haskell.org | > > > >> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users | > > > >> | > > > > | > > > > | > > > > | > > | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Michael Jones

9:20 p.m.

My hope is that if my threads are doing IO, the scheduler acts when there is an IO action with delay, or when STM blocks, etc. So at the end of my pipe out, I have: sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions) And my pipe in: returnTransactionResults :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> SourceT m (Spec, Char) returnTransactionResults dev dts = repeatedly $ do (status, spec) <- liftIO $ readIn2 dev oldDts <- liftIO $ atomically $ readTVar dts let dts' = (ord $ status!!1) .&. 0x20 let newDts = dts' /= 0 when (oldDts /= newDts) ( liftIO $ atomically $ writeTVar dts newDts) when (length spec /= 0) (mapM_ (\ch -> yield (executeSpec, ch)) spec) sendOut will do a usb bulk write, and readIn2 will do a use bulk read. Hopefully, somewhere in the usb code IO blocks for an interrupt (probably in libusb), and that allows the scheduler to switch threads. Given the behavior, I assume this is not the case, and it requires time slicing to switch threads. I also send data between the in/out pipes via TChan. Remembering that each pipe is in a thread, hopefully if a readTChan blocks, the scheduler reschedules and the other thread runs. For context, I do a lot of RTOS work, so my worldview of the expected behavior comes from that perspective. Mike On Oct 29, 2014, at 6:41 PM, Edward Z. Yang wrote:

...

Yes, that's right.

I brought it up because you mentioned that there might still be occasional delays, and those might be caused by a thread not being preemptible for a while.

Edward

Excerpts from John Lato's message of 2014-10-29 17:31:45 -0700:

...
My understanding is that -fno-omit-yields is subtly different. I think that's for the case when a function loops without performing any heap allocations, and thus would never yield even after the context switch timeout. In my case the looping function does perform heap allocations and does eventually yield, just not until after the timeout.

Is that understanding correct?

(technically, doesn't it change to yielding after stack checks or something like that?)

On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang wrote:

...
I don't think this is directly related to the problem, but if you have a thread that isn't yielding, you can force it to yield by using -fno-omit-yields on your code. It won't help if the non-yielding code is in a library, and it won't help if the problem was that you just weren't setting timeouts finely enough (which sounds like what was happening). FYI.

Edward

...
I guess I should explain what that flag does...

The GHC RTS maintains capabilities, the number of capabilities is specified by the `+RTS -N` option. Each capability is a virtual machine that executes Haskell code, and maintains its own runqueue of threads to

Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700: process.

...
A capability will perform a context switch at the next heap block allocation (every 4k of allocation) after the timer expires. The timer defaults to 20ms, and can be set by the -C flag. Capabilities perform context switches in other circumstances as well, such as when a thread yields or blocks.

My guess is that either the context switching logic changed in ghc-7.8,

...
possibly your code used to trigger a switch via some other mechanism (stack overflow or something maybe?), but is optimized differently now so instead it needs to wait for the timer to expire.

The problem we had was that a time-sensitive thread was getting scheduled on the same capability as a long-running non-yielding thread, so the time-sensitive thread had to wait for a context switch timeout (even

...
there were free cores available!). I expect even with -N4 you'll still see occasional delays (perhaps <5% of calls).

We've solved our problem with judicious use of `forkOn`, but that won't help at N1.

We did see this behavior in 7.6, but it's definitely worse in 7.8.

Incidentally, has there been any interest in a work-stealing scheduler? There was a discussion from about 2 years ago, in which Simon Marlow noted it might be tricky, but it would definitely help in situations like this.

John L.

On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

...
I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with

...
...
...
but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around

...
...
...
compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere

or though threadscope, the that

...
...
...
might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

John Lato

10:06 p.m.

Hmm, I think maybe part of the problem is in your STM blocks. On Thu, Oct 30, 2014 at 8:50 AM, Michael Jones wrote:

...

My hope is that if my threads are doing IO, the scheduler acts when there is an IO action with delay, or when STM blocks, etc.

So at the end of my pipe out, I have:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

When the dts TVar is false, this will just spin, causing the thread to keep running. Can you change it to something like: _okToProceed <- liftIO $ atomically $ do dts' <- readTVar dts when (not dts') retry (_,transactions) <- await liftIO $ sendOut dev transactions I think this should be better, because now, if dts is False, the STM transaction will block, allowing the thread to be descheduled. Furthermore it won't be rescheduled until another thread has updated dts. It looks like returnTransactionResults may suffer from the same issue, you should be able to fix it in a similar manner. John L.

...

And my pipe in:

returnTransactionResults :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> SourceT m (Spec, Char) returnTransactionResults dev dts = repeatedly $ do (status, spec) <- liftIO $ readIn2 dev oldDts <- liftIO $ atomically $ readTVar dts let dts' = (ord $ status!!1) .&. 0x20 let newDts = dts' /= 0 when (oldDts /= newDts) ( liftIO $ atomically $ writeTVar dts newDts) when (length spec /= 0) (mapM_ (\ch -> yield (executeSpec, ch)) spec)

sendOut will do a usb bulk write, and readIn2 will do a use bulk read. Hopefully, somewhere in the usb code IO blocks for an interrupt (probably in libusb), and that allows the scheduler to switch threads. Given the behavior, I assume this is not the case, and it requires time slicing to switch threads.

I also send data between the in/out pipes via TChan. Remembering that each pipe is in a thread, hopefully if a readTChan blocks, the scheduler reschedules and the other thread runs.

For context, I do a lot of RTOS work, so my worldview of the expected behavior comes from that perspective.

Mike

On Oct 29, 2014, at 6:41 PM, Edward Z. Yang wrote:

...
Yes, that's right.

I brought it up because you mentioned that there might still be occasional delays, and those might be caused by a thread not being preemptible for a while.

Edward

...
My understanding is that -fno-omit-yields is subtly different. I think that's for the case when a function loops without performing any heap allocations, and thus would never yield even after the context switch timeout. In my case the looping function does perform heap allocations and does eventually yield, just not until after the timeout.

Is that understanding correct?

(technically, doesn't it change to yielding after stack checks or something like that?)

On Thu, Oct 30, 2014 at 8:24 AM, Edward Z. Yang wrote:

...
I don't think this is directly related to the problem, but if you have a thread that isn't yielding, you can force it to yield by using -fno-omit-yields on your code. It won't help if the non-yielding code is in a library, and it won't help if the problem was that you just weren't setting timeouts finely enough (which sounds like what was happening). FYI.

Edward

...
I guess I should explain what that flag does...

The GHC RTS maintains capabilities, the number of capabilities is specified by the `+RTS -N` option. Each capability is a virtual machine that executes Haskell code, and maintains its own runqueue of threads to

Excerpts from John Lato's message of 2014-10-29 17:19:46 -0700: process.

...
A capability will perform a context switch at the next heap block allocation (every 4k of allocation) after the timer expires. The

...
defaults to 20ms, and can be set by the -C flag. Capabilities perform context switches in other circumstances as well, such as when a thread yields or blocks.

My guess is that either the context switching logic changed in ghc-7.8, or possibly your code used to trigger a switch via some other mechanism (stack overflow or something maybe?), but is optimized differently now so instead it needs to wait for the timer to expire.

The problem we had was that a time-sensitive thread was getting scheduled on the same capability as a long-running non-yielding thread, so the time-sensitive thread had to wait for a context switch timeout (even

timer though

...
there were free cores available!). I expect even with -N4 you'll still see occasional delays (perhaps <5% of calls).

We've solved our problem with judicious use of `forkOn`, but that won't help at N1.

We did see this behavior in 7.6, but it's definitely worse in 7.8.

Incidentally, has there been any interest in a work-stealing scheduler? There was a discussion from about 2 years ago, in which Simon Marlow noted it might be tricky, but it would definitely help in situations like

Excerpts from John Lato's message of 2014-10-29 17:31:45 -0700: this.

...
...
...
John L.

On Thu, Oct 30, 2014 at 8:02 AM, Michael Jones

wrote:

...
...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave

...
...
like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote:

> I have a general question about thread behavior in 7.8.3 vs 7.6.X > > I moved from 7.6 to 7.8 and my application behaves very differently. I > have three threads, an application thread that plots data with wxhaskell or > sends it over a network (depends on settings), a thread doing usb bulk > writes, and a thread doing usb bulk reads. Data is moved around with TChan, > and TVar is used for coordination. > > When the application was compiled with 7.6, my stream of usb traffic was > smooth. With 7.8, there are lots of delays where nothing seems to be > running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or > so. > > When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine > without with -N2/4. > > The program is compiled -O2 with profiling. The -N2/4 version uses more > memory, but in both cases with 7.8 and with 7.6 there is no space leak. > > I tired to compile and use -ls so I could take a look with

...
...
> but the application hangs and writes no data to the file. The CPU fans run > wild like it is in an infinite loop. It at least pops an unpainted > wxhaskell window, so it got partially running. > > One of my libraries uses option -fsimpl-tick-factor=200 to get around

...
...
> compiler. > > What do I need to know about changes to threading and event logging > between 7.6 and 7.8? Is there some general documentation somewhere

more threadscope, the that

...
...
> might help? > > I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and > installed myself, after removing 7.6 with apt-get. > > Any hints appreciated. > > Mike > > > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >

Michael Jones

13 Jan 13 Jan

11:32 a.m.

Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another. Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3. This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread. When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition. In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code. If I run with the strace command, it always runs with -C0.00N. All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues. In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues. Questions: 1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run? Mike On Oct 29, 2014, at 6:02 PM, Michael Jones wrote:

...

John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

...
By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote: I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Ben Gamari

14 Jan 14 Jan

1:32 a.m.

Michael Jones writes:

...

Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

[snip]

...

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have?

Do you know about [1]? This is a regression due to an interface change that arose from the new event manager. `usb` 1.3.0.0` has a workaround, GHC 7.10 will have a fixed event manager [2]. Given that it sounds like your program works some of the time this may not be relevant but I thought it would be negligent not to mention it. Cheers, - Ben [1] https://github.com/basvandijk/usb/issues/7 [2] https://phabricator.haskell.org/D347

Michael Jones

4:39 a.m.

Ben, Interesting. In this case, I can duplicate the problem when not using USB (USB to i2c dongle) by using /dev/i2c_n, and when I do use USB, in some cases the USB is working (can see i2c on scope), but the GUI is hung. So I believe this is not causing the problem. Thanks, Mike On Jan 13, 2015, at 1:02 PM, Ben Gamari wrote:

...

Michael Jones writes:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

[snip]

...
1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have?

Do you know about [1]? This is a regression due to an interface change that arose from the new event manager. `usb` 1.3.0.0` has a workaround, GHC 7.10 will have a fixed event manager [2].

Given that it sounds like your program works some of the time this may not be relevant but I thought it would be negligent not to mention it.

Cheers,

- Ben

[1] https://github.com/basvandijk/usb/issues/7 [2] https://phabricator.haskell.org/D347

Michael Jones

18 Jan 18 Jan

6:45 a.m.

I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes. When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time. If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO. The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case. Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork? Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness. On Jan 12, 2015, at 11:02 PM, Michael Jones wrote:

...

Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato wrote:

...
By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones wrote: I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Donn Cave

11:30 a.m.

Quoth Michael Jones , ...

...

...
5) What does -V0 do that makes a problem program run?

Well, there's that SIGVTALRM barrage, you may remember we went over that mid-August. I expect there are other effects. Donn

Michael Jones

19 Jan 19 Jan

4:06 a.m.

Donn, True, but in that case I was using a driver for the Aardvark, and my current two test cases use: A) DC1613A from LTC B) /dev/i2c driver with FFI wrapper I wrote Case A uses the haskell usb package and libusb. I suppose SIGVALRM could be in used in the libusb driver. I know for sure it is not used by my I2C stuff, unless it is behind the /dev/i2c user mode calls. But interesting. Obviously the scheduler is using timers from the OS. Is it really an advantage not to use OS threads all around? Is there anyway to enable such behavior to see if things are better? Mike On Jan 17, 2015, at 11:00 PM, Donn Cave wrote:

...

Quoth Michael Jones , ...

...
...
5) What does -V0 do that makes a problem program run?

Well, there's that SIGVTALRM barrage, you may remember we went over that mid-August. I expect there are other effects.

Donn

Simon Marlow

4:07 p.m.

Hi Michael, Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem. The code you showed us was: sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions) This loops when the contents of the TVar is False. Cheers, Simon On 18/01/2015 01:15, Michael Jones wrote:

...

I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato mailto:jwlato@gmail.com> wrote:

...
By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones mailto:mike@proclivis.com> wrote:

I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Michael Jones

20 Jan 20 Jan

9:19 p.m.

Simon, This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing. I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file. Mike On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

...

Hi Michael,

Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem.

The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato mailto:jwlato@gmail.com> wrote:

...
By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones mailto:mike@proclivis.com> wrote:

I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Simon Marlow

9:30 p.m.

My guess would be that either - a thread is in a non-allocating loop - a long-running foreign call is marked unsafe Either of these would block the other threads. ThreadScope together with some traceEventIO calls might help you identify the culprit. Cheers, Simon On 20/01/2015 15:49, Michael Jones wrote:

...

Simon,

This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing.

I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file.

Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

...
Hi Michael,

Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem.

The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato mailto:jwlato@gmail.com> wrote:

...
By any chance do the delays get shorter if you run your program with `+RTS -C0.005` ? If so, I suspect you're having a problem very similar to one that we had with ghc-7.8 (7.6 too, but it's worse on ghc-7.8 for some reason), involving possible misbehavior of the thread scheduler.

On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones mailto:mike@proclivis.com> wrote:

I have a general question about thread behavior in 7.8.3 vs 7.6.X

I moved from 7.6 to 7.8 and my application behaves very differently. I have three threads, an application thread that plots data with wxhaskell or sends it over a network (depends on settings), a thread doing usb bulk writes, and a thread doing usb bulk reads. Data is moved around with TChan, and TVar is used for coordination.

When the application was compiled with 7.6, my stream of usb traffic was smooth. With 7.8, there are lots of delays where nothing seems to be running. These delays are up to 40ms, whereas with 7.6 delays were a 1ms or so.

When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it runs fine without with -N2/4.

The program is compiled -O2 with profiling. The -N2/4 version uses more memory, but in both cases with 7.8 and with 7.6 there is no space leak.

I tired to compile and use -ls so I could take a look with threadscope, but the application hangs and writes no data to the file. The CPU fans run wild like it is in an infinite loop. It at least pops an unpainted wxhaskell window, so it got partially running.

One of my libraries uses option -fsimpl-tick-factor=200 to get around the compiler.

What do I need to know about changes to threading and event logging between 7.6 and 7.8? Is there some general documentation somewhere that might help?

I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar ball and installed myself, after removing 7.6 with apt-get.

Any hints appreciated.

Mike

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Michael Jones

21 Jan 21 Jan

9:13 a.m.

Simon, The code below hangs on the frameEx function. But, if I change it to: f <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool" rectZero (frameDefaultStyle .|. wxMAXIMIZE) it will progress, but no frame pops up, except once in many tries. Still hangs, but progresses through all the setup code. However, I did make past statements that a non-GUI version was hanging. So I am not blaming wxHaskell. Just noting that in this case it is where things go wrong. Anyone, Are there any wxHaskell experts around that might have some insight? (Remember, works on single core 32 bit, works on quad core 64 bit, fails on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to fix problems like this?) — CODE SAMPLE -------- gui :: IO () gui = do values <- varCreate [] -- Values to be painted timeLine <- varCreate 0 -- Line time sample <- varCreate 0 -- Sample Number running <- varCreate True -- True when telemetry is active <<HANG HERE>> f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope Tool"] objectNull Setup GUI components code was here return () go :: IO () go = do putStrLn "Start GUI" start $ gui exeMain :: IO () exeMain = do hSetBuffering stdout NoBuffering getArgs >>= parse where parse ["-h"] = usage >> exit parse ["-v"] = version >> exit parse [] = go parse [url, port, session, target] = goServer url port (read session) (read target) usage = putStrLn "Usage: linti-scope [url, port, session, target]" version = putStrLn "Haskell linti-scope 0.1.0.0" exit = System.Exit.exitWith System.Exit.ExitSuccess die = System.Exit.exitWith (System.Exit.ExitFailure 1) #ifndef MAIN_FUNCTION #define MAIN_FUNCTION exeMain #endif main = MAIN_FUNCTION On Jan 20, 2015, at 9:00 AM, Simon Marlow wrote:

...

My guess would be that either - a thread is in a non-allocating loop - a long-running foreign call is marked unsafe

Either of these would block the other threads. ThreadScope together with some traceEventIO calls might help you identify the culprit.

Cheers, Simon

On 20/01/2015 15:49, Michael Jones wrote:

...
Simon,

This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing.

I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file.

Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

...
Hi Michael,

Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem.

The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
John,

Adding -C0.005 makes it much better. Using -C0.001 makes it behave more like -N4.

Thanks. This saves my project, as I need to deploy on a single core Atom and was stuck.

Mike

On Oct 29, 2014, at 5:12 PM, John Lato mailto:jwlato@gmail.com> wrote:

> By any chance do the delays get shorter if you run your program with > `+RTS -C0.005` ? If so, I suspect you're having a problem very > similar to one that we had with ghc-7.8 (7.6 too, but it's worse on > ghc-7.8 for some reason), involving possible misbehavior of the > thread scheduler. > > On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones mailto:mike@proclivis.com> wrote: > > I have a general question about thread behavior in 7.8.3 vs 7.6.X > > I moved from 7.6 to 7.8 and my application behaves very > differently. I have three threads, an application thread that > plots data with wxhaskell or sends it over a network (depends on > settings), a thread doing usb bulk writes, and a thread doing > usb bulk reads. Data is moved around with TChan, and TVar is > used for coordination. > > When the application was compiled with 7.6, my stream of usb > traffic was smooth. With 7.8, there are lots of delays where > nothing seems to be running. These delays are up to 40ms, > whereas with 7.6 delays were a 1ms or so. > > When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it > runs fine without with -N2/4. > > The program is compiled -O2 with profiling. The -N2/4 version > uses more memory, but in both cases with 7.8 and with 7.6 there > is no space leak. > > I tired to compile and use -ls so I could take a look with > threadscope, but the application hangs and writes no data to the > file. The CPU fans run wild like it is in an infinite loop. It > at least pops an unpainted wxhaskell window, so it got partially > running. > > One of my libraries uses option -fsimpl-tick-factor=200 to get > around the compiler. > > What do I need to know about changes to threading and event > logging between 7.6 and 7.8? Is there some general documentation > somewhere that might help? > > I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar > ball and installed myself, after removing 7.6 with apt-get. > > Any hints appreciated. > > Mike > > > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > mailto:Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users > >

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Carter Schonwald

11:04 a.m.

i think ben gamari hit similar/related issues with the lib usb bindings in 7.8, and i believe some / all of them are fixed in 7.10 (i could be mixing things up though) On Tue, Jan 20, 2015 at 10:43 PM, Michael Jones wrote:

...

Simon,

The code below hangs on the frameEx function.

But, if I change it to:

f <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool" rectZero (frameDefaultStyle .|. wxMAXIMIZE)

it will progress, but no frame pops up, except once in many tries. Still hangs, but progresses through all the setup code.

However, I did make past statements that a non-GUI version was hanging. So I am not blaming wxHaskell. Just noting that in this case it is where things go wrong.

Anyone,

Are there any wxHaskell experts around that might have some insight?

(Remember, works on single core 32 bit, works on quad core 64 bit, fails on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to fix problems like this?)

— CODE SAMPLE --------

gui :: IO () gui = do values <- varCreate [] -- Values to be painted timeLine <- varCreate 0 -- Line time sample <- varCreate 0 -- Sample Number running <- varCreate True -- True when telemetry is active

<<HANG HERE>>

f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope Tool"] objectNull

Setup GUI components code was here

return ()

go :: IO () go = do putStrLn "Start GUI" start $ gui

exeMain :: IO () exeMain = do hSetBuffering stdout NoBuffering getArgs >>= parse where parse ["-h"] = usage >> exit parse ["-v"] = version >> exit parse [] = go parse [url, port, session, target] = goServer url port (read session) (read target)

usage = putStrLn "Usage: linti-scope [url, port, session, target]" version = putStrLn "Haskell linti-scope 0.1.0.0" exit = System.Exit.exitWith System.Exit.ExitSuccess die = System.Exit.exitWith (System.Exit.ExitFailure 1)

#ifndef MAIN_FUNCTION #define MAIN_FUNCTION exeMain #endif main = MAIN_FUNCTION

On Jan 20, 2015, at 9:00 AM, Simon Marlow wrote:

...
My guess would be that either - a thread is in a non-allocating loop - a long-running foreign call is marked unsafe

Either of these would block the other threads. ThreadScope together with some traceEventIO calls might help you identify the culprit.

Cheers, Simon

...
Simon,

This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just

On 20/01/2015 15:49, Michael Jones wrote: the wxhaskell part of the program fails to start. This matches a previous observation based on printing.

...
I’ll see if I can hack up the code to a minimal set that I can publish.

All the IP is in the I2C code, so I might be able to get it down to one file.

...
Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

...
Hi Michael,

Previously in this thread it was pointed out that your code was doing

...
...
The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool

-> ProcessT m (Spec, String) ()

...
sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and

...
...
...
a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like

...
...
...
case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of

busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem. that this the

...
...
...
...
program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

> John, > > Adding -C0.005 makes it much better. Using -C0.001 makes it behave > more like -N4. > > Thanks. This saves my project, as I need to deploy on a single core > Atom and was stuck. > > Mike > > On Oct 29, 2014, at 5:12 PM, John Lato mailto:jwlato@gmail.com> wrote: > >> By any chance do the delays get shorter if you run your program with >> `+RTS -C0.005` ? If so, I suspect you're having a problem very >> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on >> ghc-7.8 for some reason), involving possible misbehavior of the >> thread scheduler. >> >> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones > mailto:mike@proclivis.com> wrote: >> >> I have a general question about thread behavior in 7.8.3 vs 7.6.X >> >> I moved from 7.6 to 7.8 and my application behaves very >> differently. I have three threads, an application thread that >> plots data with wxhaskell or sends it over a network (depends on >> settings), a thread doing usb bulk writes, and a thread doing >> usb bulk reads. Data is moved around with TChan, and TVar is >> used for coordination. >> >> When the application was compiled with 7.6, my stream of usb >> traffic was smooth. With 7.8, there are lots of delays where >> nothing seems to be running. These delays are up to 40ms, >> whereas with 7.6 delays were a 1ms or so. >> >> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it >> runs fine without with -N2/4. >> >> The program is compiled -O2 with profiling. The -N2/4 version >> uses more memory, but in both cases with 7.8 and with 7.6 there >> is no space leak. >> >> I tired to compile and use -ls so I could take a look with >> threadscope, but the application hangs and writes no data to the >> file. The CPU fans run wild like it is in an infinite loop. It >> at least pops an unpainted wxhaskell window, so it got partially >> running. >> >> One of my libraries uses option -fsimpl-tick-factor=200 to get >> around the compiler. >> >> What do I need to know about changes to threading and event >> logging between 7.6 and 7.8? Is there some general documentation >> somewhere that might help? >> >> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar >> ball and installed myself, after removing 7.6 with apt-get. >> >> Any hints appreciated. >> >> Mike >> >> >> _______________________________________________ >> Glasgow-haskell-users mailing list >> Glasgow-haskell-users@haskell.org >> mailto:Glasgow-haskell-users@haskell.org >> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >> >> >

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Simon Marlow

3:48 p.m.

On 21/01/2015 03:43, Michael Jones wrote:

...

Simon,

The code below hangs on the frameEx function.

But, if I change it to:

f <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool" rectZero (frameDefaultStyle .|. wxMAXIMIZE)

it will progress, but no frame pops up, except once in many tries. Still hangs, but progresses through all the setup code.

However, I did make past statements that a non-GUI version was hanging. So I am not blaming wxHaskell. Just noting that in this case it is where things go wrong.

Anyone,

Are there any wxHaskell experts around that might have some insight?

(Remember, works on single core 32 bit, works on quad core 64 bit, fails on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to fix problems like this?)

No, there are no recently fixed or outstanding bugs in this area that I'm aware of. From the symptoms I strongly suspect there's an unsafe foreign call somewhere causing problems, or another busy-wait loop. Cheers, Simon

...

— CODE SAMPLE --------

gui :: IO () gui = do values <- varCreate [] -- Values to be painted timeLine <- varCreate 0 -- Line time sample <- varCreate 0 -- Sample Number running <- varCreate True -- True when telemetry is active

<<HANG HERE>>

f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope Tool"] objectNull

Setup GUI components code was here

return ()

go :: IO () go = do putStrLn "Start GUI" start $ gui

exeMain :: IO () exeMain = do hSetBuffering stdout NoBuffering getArgs >>= parse where parse ["-h"] = usage >> exit parse ["-v"] = version >> exit parse [] = go parse [url, port, session, target] = goServer url port (read session) (read target)

usage = putStrLn "Usage: linti-scope [url, port, session, target]" version = putStrLn "Haskell linti-scope 0.1.0.0" exit = System.Exit.exitWith System.Exit.ExitSuccess die = System.Exit.exitWith (System.Exit.ExitFailure 1)

#ifndef MAIN_FUNCTION #define MAIN_FUNCTION exeMain #endif main = MAIN_FUNCTION

On Jan 20, 2015, at 9:00 AM, Simon Marlow wrote:

...
My guess would be that either - a thread is in a non-allocating loop - a long-running foreign call is marked unsafe

Either of these would block the other threads. ThreadScope together with some traceEventIO calls might help you identify the culprit.

Cheers, Simon

On 20/01/2015 15:49, Michael Jones wrote:

...
Simon,

This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing.

I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file.

Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

...
Hi Michael,

Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem.

The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

...
Sorry I am reviving an old problem, but it has resurfaced, such that one system behaves different than another.

Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, something bad is going on. In summary, the same code that runs on two machines does not run on a third machine. So this indicates I have not made any breaking changes to the code or cabal files. Compiling with GHC 7.8.3.

This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 kernel. It is a dual core 64 bit I86 Atom processor. The application hangs at startup. If I remove the -C0.00N option and instead use -V0, the application runs. It has bad timing properties, but it does at least run. Note that a hang hangs an IO thread talking USB, and the GUI thread.

When testing with the -C0.00N option, it did run 2 times out of 20 tries, so fail means fail most but not all of the time. When it did run, it continued to run properly. This perhaps indicates some kind of internal race condition.

In the fail to run case, it does some printing up to the point where it tries to create a wxHaskell frame. In another non-UI version of the program it also fails to run. Logging to a file gives a similar indication. It is clear that the program starts up, then fails during the run in some form of lockup, well after the initial startup code.

If I run with the strace command, it always runs with -C0.00N.

All the above was done with profiling enabled, so I removed that and instead enabled eventlog to look for clues.

In this case it lies between good and bad, in that IO to my USB is working, but the GUI comes up blank and never paints. Running this case without -v0 (event log) the gui partially paints and stops, but USB continues.

Questions:

1) Does ghc 7.8.4 have any improvements that might pertain to these kinds of scheduling/thread problems? 2) Is there anything about the nature of a thread using USB, I2C, or wxHaskell IO that leads to problems that a pure calculation app would not have? 3) Any ideas how to track down the problem when changing conditions (compiler or runtime options) affects behavior? 4) Are there other options besides -V and -C for the runtime that might apply? 5) What does -V0 do that makes a problem program run?

Mike

On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

> John, > > Adding -C0.005 makes it much better. Using -C0.001 makes it behave > more like -N4. > > Thanks. This saves my project, as I need to deploy on a single core > Atom and was stuck. > > Mike > > On Oct 29, 2014, at 5:12 PM, John Lato mailto:jwlato@gmail.com> wrote: > >> By any chance do the delays get shorter if you run your program with >> `+RTS -C0.005` ? If so, I suspect you're having a problem very >> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on >> ghc-7.8 for some reason), involving possible misbehavior of the >> thread scheduler. >> >> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones > mailto:mike@proclivis.com> wrote: >> >> I have a general question about thread behavior in 7.8.3 vs 7.6.X >> >> I moved from 7.6 to 7.8 and my application behaves very >> differently. I have three threads, an application thread that >> plots data with wxhaskell or sends it over a network (depends on >> settings), a thread doing usb bulk writes, and a thread doing >> usb bulk reads. Data is moved around with TChan, and TVar is >> used for coordination. >> >> When the application was compiled with 7.6, my stream of usb >> traffic was smooth. With 7.8, there are lots of delays where >> nothing seems to be running. These delays are up to 40ms, >> whereas with 7.6 delays were a 1ms or so. >> >> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it >> runs fine without with -N2/4. >> >> The program is compiled -O2 with profiling. The -N2/4 version >> uses more memory, but in both cases with 7.8 and with 7.6 there >> is no space leak. >> >> I tired to compile and use -ls so I could take a look with >> threadscope, but the application hangs and writes no data to the >> file. The CPU fans run wild like it is in an infinite loop. It >> at least pops an unpainted wxhaskell window, so it got partially >> running. >> >> One of my libraries uses option -fsimpl-tick-factor=200 to get >> around the compiler. >> >> What do I need to know about changes to threading and event >> logging between 7.6 and 7.8? Is there some general documentation >> somewhere that might help? >> >> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar >> ball and installed myself, after removing 7.6 with apt-get. >> >> Any hints appreciated. >> >> Mike >> >> >> _______________________________________________ >> Glasgow-haskell-users mailing list >> Glasgow-haskell-users@haskell.org >> mailto:Glasgow-haskell-users@haskell.org >> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >> >> >

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org mailto:Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Carter Schonwald

6:27 p.m.

woops, forgot to attach the relevant links, (i shouldn't email late at night :) ) https://github.com/basvandijk/usb/issues/7 is the lib usb matter https://phabricator.haskell.org/D347 point being: on ghc 7.8, certain hanging behavior from libusb (at least as of a few months ago) was due to one shotedness On Wed, Jan 21, 2015 at 5:18 AM, Simon Marlow wrote:

...

On 21/01/2015 03:43, Michael Jones wrote:

...
Simon,

The code below hangs on the frameEx function.

But, if I change it to:

f <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool" rectZero (frameDefaultStyle .|. wxMAXIMIZE)

it will progress, but no frame pops up, except once in many tries. Still hangs, but progresses through all the setup code.

However, I did make past statements that a non-GUI version was hanging. So I am not blaming wxHaskell. Just noting that in this case it is where things go wrong.

Anyone,

Are there any wxHaskell experts around that might have some insight?

(Remember, works on single core 32 bit, works on quad core 64 bit, fails on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to fix problems like this?)

No, there are no recently fixed or outstanding bugs in this area that I'm aware of.

From the symptoms I strongly suspect there's an unsafe foreign call somewhere causing problems, or another busy-wait loop.

Cheers, Simon

...
— CODE SAMPLE --------

gui :: IO () gui = do values <- varCreate [] -- Values to be painted timeLine <- varCreate 0 -- Line time sample <- varCreate 0 -- Sample Number running <- varCreate True -- True when telemetry is active

<<HANG HERE>>

f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope Tool"] objectNull

Setup GUI components code was here

return ()

go :: IO () go = do putStrLn "Start GUI" start $ gui

exeMain :: IO () exeMain = do hSetBuffering stdout NoBuffering getArgs >>= parse where parse ["-h"] = usage >> exit parse ["-v"] = version >> exit parse [] = go parse [url, port, session, target] = goServer url port (read session) (read target)

usage = putStrLn "Usage: linti-scope [url, port, session, target]" version = putStrLn "Haskell linti-scope 0.1.0.0" exit = System.Exit.exitWith System.Exit.ExitSuccess die = System.Exit.exitWith (System.Exit.ExitFailure 1)

#ifndef MAIN_FUNCTION #define MAIN_FUNCTION exeMain #endif main = MAIN_FUNCTION

On Jan 20, 2015, at 9:00 AM, Simon Marlow wrote:

My guess would be that either

...
- a thread is in a non-allocating loop - a long-running foreign call is marked unsafe

Either of these would block the other threads. ThreadScope together with some traceEventIO calls might help you identify the culprit.

Cheers, Simon

On 20/01/2015 15:49, Michael Jones wrote:

...
Simon,

This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing.

I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file.

Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

Hi Michael,

...
Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem.

The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

Sorry I am reviving an old problem, but it has resurfaced, such that > one system behaves different than another. > > Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on > a single core 32 bit Atom NUC. But on a dual core Atom > MinnowBoardMax, > something bad is going on. In summary, the same code that runs on two > machines does not run on a third machine. So this indicates I have > not > made any breaking changes to the code or cabal files. Compiling with > GHC 7.8.3. > > This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 > kernel. It is a dual core 64 bit I86 Atom processor. The application > hangs at startup. If I remove the -C0.00N option and instead use -V0, > the application runs. It has bad timing properties, but it does at > least run. Note that a hang hangs an IO thread talking USB, and the > GUI thread. > > When testing with the -C0.00N option, it did run 2 times out of 20 > tries, so fail means fail most but not all of the time. When it did > run, it continued to run properly. This perhaps indicates some kind > of > internal race condition. > > In the fail to run case, it does some printing up to the point where > it tries to create a wxHaskell frame. In another non-UI version of > the > program it also fails to run. Logging to a file gives a similar > indication. It is clear that the program starts up, then fails during > the run in some form of lockup, well after the initial startup code. > > If I run with the strace command, it always runs with -C0.00N. > > All the above was done with profiling enabled, so I removed that and > instead enabled eventlog to look for clues. > > In this case it lies between good and bad, in that IO to my USB is > working, but the GUI comes up blank and never paints. Running this > case without -v0 (event log) the gui partially paints and stops, but > USB continues. > > Questions: > > 1) Does ghc 7.8.4 have any improvements that might pertain to these > kinds of scheduling/thread problems? > 2) Is there anything about the nature of a thread using USB, I2C, or > wxHaskell IO that leads to problems that a pure calculation app would > not have? > 3) Any ideas how to track down the problem when changing conditions > (compiler or runtime options) affects behavior? > 4) Are there other options besides -V and -C for the runtime that > might apply? > 5) What does -V0 do that makes a problem program run? > > Mike > > > > > On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote: > > John, >> >> Adding -C0.005 makes it much better. Using -C0.001 makes it behave >> more like -N4. >> >> Thanks. This saves my project, as I need to deploy on a single core >> Atom and was stuck. >> >> Mike >> >> On Oct 29, 2014, at 5:12 PM, John Lato > mailto:jwlato@gmail.com> wrote: >> >> By any chance do the delays get shorter if you run your program >>> with >>> `+RTS -C0.005` ? If so, I suspect you're having a problem very >>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on >>> ghc-7.8 for some reason), involving possible misbehavior of the >>> thread scheduler. >>> >>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones >> mailto:mike@proclivis.com> wrote: >>> >>> I have a general question about thread behavior in 7.8.3 vs >>> 7.6.X >>> >>> I moved from 7.6 to 7.8 and my application behaves very >>> differently. I have three threads, an application thread that >>> plots data with wxhaskell or sends it over a network (depends >>> on >>> settings), a thread doing usb bulk writes, and a thread doing >>> usb bulk reads. Data is moved around with TChan, and TVar is >>> used for coordination. >>> >>> When the application was compiled with 7.6, my stream of usb >>> traffic was smooth. With 7.8, there are lots of delays where >>> nothing seems to be running. These delays are up to 40ms, >>> whereas with 7.6 delays were a 1ms or so. >>> >>> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it >>> runs fine without with -N2/4. >>> >>> The program is compiled -O2 with profiling. The -N2/4 version >>> uses more memory, but in both cases with 7.8 and with 7.6 >>> there >>> is no space leak. >>> >>> I tired to compile and use -ls so I could take a look with >>> threadscope, but the application hangs and writes no data to >>> the >>> file. The CPU fans run wild like it is in an infinite loop. It >>> at least pops an unpainted wxhaskell window, so it got >>> partially >>> running. >>> >>> One of my libraries uses option -fsimpl-tick-factor=200 to get >>> around the compiler. >>> >>> What do I need to know about changes to threading and event >>> logging between 7.6 and 7.8? Is there some general >>> documentation >>> somewhere that might help? >>> >>> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar >>> ball and installed myself, after removing 7.6 with apt-get. >>> >>> Any hints appreciated. >>> >>> Mike >>> >>> >>> _______________________________________________ >>> Glasgow-haskell-users mailing list >>> Glasgow-haskell-users@haskell.org >>> mailto:Glasgow-haskell-users@haskell.org >>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >>> >>> >>> >> > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > mailto:Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Michael Jones

8:26 p.m.

Simon, I went back and retested my non-GUI version and it seems to work fine. But here is what is strange, the non-GUI version is really just a client server version of what I have problems with. I have a non-GUI app running the USB and streaming data to a server. The client app (the one that has the lockup), works fine when in the client server mode. In this mode, it executes the very same code I listed below that locked up. The main difference is where in the code below it says: "Setup GUI components code was here”. The client server version just connects to the server rather than start up the USB IO. Strange that the behavior is so sensitive. Is there any plans to make the scheduling more pre-emptive so that rogue threads can’t derail an application? Seems to open up lots of difficulties when you are reusing lots of libraries you are not familiar with to build a large application. The more libraries you use, the more unknown risk you are taking that you project is killed because you can’t meet a deadline. I think I’ll let 7.10 settling down with one maintenance release and then give it a try just to see if it is any different. If that fails, I’ll scratch my head some more. What I don’t want to do is dig into wxHaskell’s FFI. I have a Python GUI started and I can just use that. The motivation for that was the inability to get wxHaskell to work on all three platforms (Windows, Linux, Mac), and getting python GUIs to work on all three was not too hard. Granted, I would prefer Haskell, but it is an enormous task to make a GUI work on all platforms. Unlike non-GUI libraries, it is not “just works”, at least it wasn’t for me. Mike On Jan 21, 2015, at 3:18 AM, Simon Marlow wrote:

...

On 21/01/2015 03:43, Michael Jones wrote:

...
Simon,

The code below hangs on the frameEx function.

But, if I change it to:

f <- frameCreate objectNull idAny "linti-scope PMBus Scope Tool" rectZero (frameDefaultStyle .|. wxMAXIMIZE)

it will progress, but no frame pops up, except once in many tries. Still hangs, but progresses through all the setup code.

However, I did make past statements that a non-GUI version was hanging. So I am not blaming wxHaskell. Just noting that in this case it is where things go wrong.

Anyone,

Are there any wxHaskell experts around that might have some insight?

(Remember, works on single core 32 bit, works on quad core 64 bit, fails on 2 core 64 bit. Using GHC 7.8.3. Any recent updates to the code base to fix problems like this?)

No, there are no recently fixed or outstanding bugs in this area that I'm aware of.

From the symptoms I strongly suspect there's an unsafe foreign call somewhere causing problems, or another busy-wait loop.

Cheers, Simon

...
— CODE SAMPLE --------

gui :: IO () gui = do values <- varCreate [] -- Values to be painted timeLine <- varCreate 0 -- Line time sample <- varCreate 0 -- Sample Number running <- varCreate True -- True when telemetry is active

<<HANG HERE>>

f <- frameEx frameDefaultStyle [ text := "linti-scope PMBus Scope Tool"] objectNull

Setup GUI components code was here

return ()

go :: IO () go = do putStrLn "Start GUI" start $ gui

exeMain :: IO () exeMain = do hSetBuffering stdout NoBuffering getArgs >>= parse where parse ["-h"] = usage >> exit parse ["-v"] = version >> exit parse [] = go parse [url, port, session, target] = goServer url port (read session) (read target)

usage = putStrLn "Usage: linti-scope [url, port, session, target]" version = putStrLn "Haskell linti-scope 0.1.0.0" exit = System.Exit.exitWith System.Exit.ExitSuccess die = System.Exit.exitWith (System.Exit.ExitFailure 1)

#ifndef MAIN_FUNCTION #define MAIN_FUNCTION exeMain #endif main = MAIN_FUNCTION

On Jan 20, 2015, at 9:00 AM, Simon Marlow wrote:

...
My guess would be that either - a thread is in a non-allocating loop - a long-running foreign call is marked unsafe

Either of these would block the other threads. ThreadScope together with some traceEventIO calls might help you identify the culprit.

Cheers, Simon

On 20/01/2015 15:49, Michael Jones wrote:

...
Simon,

This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing.

I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file.

Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow wrote:

...
Hi Michael,

Previously in this thread it was pointed out that your code was doing busy waiting, and so the problem can be fixed by modifying your code to not do busy waiting. Did you do this? The -C flag is just a workaround which will make the RTS reschedule more often, it won't fix the underlying problem.

The code you showed us was:

sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> ProcessT m (Spec, String) () sendTransactions dev dts = repeatedly $ do dts' <- liftIO $ atomically $ readTVar dts when (dts' == True) (do (_, transactions) <- await liftIO $ sendOut dev transactions)

This loops when the contents of the TVar is False.

Cheers, Simon

On 18/01/2015 01:15, Michael Jones wrote:

...
I have narrowed down the problem a bit. It turns out that many times if I run the program and wait long enough, it will start. Given an event log, it may take from 1000-10000 entries sometimes.

When I look at a good start vs. slow start, I see that in both cases things startup and there is some thread activity for thread 2 and 3, then the application starts creating other threads, which is when the wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, it just gets stuck on thread 2/3 activity for a very long time.

If I switch from -C0.001 to -C0.010, the startup is more reliable, in that most starts result in an immediate GUI and i2c IO.

The behavior suggests to me that some initial threads are starving the ability for other threads to start, and perhaps on a dual core machine it is more of a problem than single or quad core machines. For certain, due to some printing, I know that the main thread is starting, and that a print just before the first fork is not printing. Code between them is evaluating wxhaskell functions, but the main frame is not yet asked to become visible. From last week, I know that an non-gui version of the app is getting stuck, but I do not know if it eventually runs like this case.

Is there some convention that when I look at an event log you can tell which threads are OS threads vs threads from fork?

Perhaps someone that knows the scheduler might have some advice. It seems odd that a scheduler could behave this way. The scheduler should have some built in notion of fairness.

On Jan 12, 2015, at 11:02 PM, Michael Jones mailto:mike@proclivis.com> wrote:

> Sorry I am reviving an old problem, but it has resurfaced, such that > one system behaves different than another. > > Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on > a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, > something bad is going on. In summary, the same code that runs on two > machines does not run on a third machine. So this indicates I have not > made any breaking changes to the code or cabal files. Compiling with > GHC 7.8.3. > > This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 > kernel. It is a dual core 64 bit I86 Atom processor. The application > hangs at startup. If I remove the -C0.00N option and instead use -V0, > the application runs. It has bad timing properties, but it does at > least run. Note that a hang hangs an IO thread talking USB, and the > GUI thread. > > When testing with the -C0.00N option, it did run 2 times out of 20 > tries, so fail means fail most but not all of the time. When it did > run, it continued to run properly. This perhaps indicates some kind of > internal race condition. > > In the fail to run case, it does some printing up to the point where > it tries to create a wxHaskell frame. In another non-UI version of the > program it also fails to run. Logging to a file gives a similar > indication. It is clear that the program starts up, then fails during > the run in some form of lockup, well after the initial startup code. > > If I run with the strace command, it always runs with -C0.00N. > > All the above was done with profiling enabled, so I removed that and > instead enabled eventlog to look for clues. > > In this case it lies between good and bad, in that IO to my USB is > working, but the GUI comes up blank and never paints. Running this > case without -v0 (event log) the gui partially paints and stops, but > USB continues. > > Questions: > > 1) Does ghc 7.8.4 have any improvements that might pertain to these > kinds of scheduling/thread problems? > 2) Is there anything about the nature of a thread using USB, I2C, or > wxHaskell IO that leads to problems that a pure calculation app would > not have? > 3) Any ideas how to track down the problem when changing conditions > (compiler or runtime options) affects behavior? > 4) Are there other options besides -V and -C for the runtime that > might apply? > 5) What does -V0 do that makes a problem program run? > > Mike > > > > > On Oct 29, 2014, at 6:02 PM, Michael Jones mailto:mike@proclivis.com> wrote: > >> John, >> >> Adding -C0.005 makes it much better. Using -C0.001 makes it behave >> more like -N4. >> >> Thanks. This saves my project, as I need to deploy on a single core >> Atom and was stuck. >> >> Mike >> >> On Oct 29, 2014, at 5:12 PM, John Lato > mailto:jwlato@gmail.com> wrote: >> >>> By any chance do the delays get shorter if you run your program with >>> `+RTS -C0.005` ? If so, I suspect you're having a problem very >>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on >>> ghc-7.8 for some reason), involving possible misbehavior of the >>> thread scheduler. >>> >>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones >> mailto:mike@proclivis.com> wrote: >>> >>> I have a general question about thread behavior in 7.8.3 vs 7.6.X >>> >>> I moved from 7.6 to 7.8 and my application behaves very >>> differently. I have three threads, an application thread that >>> plots data with wxhaskell or sends it over a network (depends on >>> settings), a thread doing usb bulk writes, and a thread doing >>> usb bulk reads. Data is moved around with TChan, and TVar is >>> used for coordination. >>> >>> When the application was compiled with 7.6, my stream of usb >>> traffic was smooth. With 7.8, there are lots of delays where >>> nothing seems to be running. These delays are up to 40ms, >>> whereas with 7.6 delays were a 1ms or so. >>> >>> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it >>> runs fine without with -N2/4. >>> >>> The program is compiled -O2 with profiling. The -N2/4 version >>> uses more memory, but in both cases with 7.8 and with 7.6 there >>> is no space leak. >>> >>> I tired to compile and use -ls so I could take a look with >>> threadscope, but the application hangs and writes no data to the >>> file. The CPU fans run wild like it is in an infinite loop. It >>> at least pops an unpainted wxhaskell window, so it got partially >>> running. >>> >>> One of my libraries uses option -fsimpl-tick-factor=200 to get >>> around the compiler. >>> >>> What do I need to know about changes to threading and event >>> logging between 7.6 and 7.8? Is there some general documentation >>> somewhere that might help? >>> >>> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar >>> ball and installed myself, after removing 7.6 with apt-get. >>> >>> Any hints appreciated. >>> >>> Mike >>> >>> >>> _______________________________________________ >>> Glasgow-haskell-users mailing list >>> Glasgow-haskell-users@haskell.org >>> mailto:Glasgow-haskell-users@haskell.org >>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >>> >>> >> > > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > mailto:Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Bas van Dijk

22 Jan 22 Jan

1:22 a.m.

Hi Michael, Are you already using usb-1.3.0.0? If not, could you upgrade and test again? That release fixed the deadlock that Ben and Carter where talking about. Good luck, Bas

Michael Jones

2:32 a.m.

Bas, I have not upgraded, mainly because my problems manifest without enabling USB. However, I think I can upgrade in a few days and move forward. Are you using ghc 7.8.10 these days or something older? Mike On Jan 21, 2015, at 12:52 PM, Bas van Dijk wrote:

...

Hi Michael,

Are you already using usb-1.3.0.0? If not, could you upgrade and test again? That release fixed the deadlock that Ben and Carter where talking about.

Good luck,

Bas

Michael Jones

7 a.m.

Bas, I checked my cabal file and I was already using 1.3.0.0. Mike On Jan 21, 2015, at 12:52 PM, Bas van Dijk wrote:

...

Hi Michael,

Are you already using usb-1.3.0.0? If not, could you upgrade and test again? That release fixed the deadlock that Ben and Carter where talking about.

Good luck,

Bas

4012

Age (days ago)

4097

Last active (days ago)

List overview

Download

29 comments

9 participants

participants (9)

Bas van Dijk
Ben Gamari
Carter Schonwald
Donn Cave
Edward Z. Yang
John Lato
Michael Jones
Simon Marlow
Simon Peyton Jones