Re: [Haskell-cafe] External system connections

newer
Good XML-Enumerator Examples

older
NLP libraries and tools?

Richard Wallace

10 Jul 2011 10 Jul '11

11:16 p.m.

On Sun, Jul 10, 2011 at 2:54 PM, Brandon Allbery wrote:

...

On Sun, Jul 10, 2011 at 17:34, Richard Wallace wrote:

...
Rather than a single separate thread that makes requests I was hoping to make several soap requests concurrently, rather than have them be made serially. I would only want to block other threads making soap request if one of them returns a response indicating the token they were using is no longer valid.

Is there one shared token, or one per requesting thread, or some other arrangement?

There should be one shared token.

...

In either case, separation of concerns makes me think the token handling belongs on the other end: a thread writes a request down a Chan, a queue dispatcher thread reads it if there are available workers and the token is available (with callers blocking otherwise), dispatcher sends request to available worker. If the worker finds the token is invalid, it calls back in to the dispatcher with a token renewal request, which causes anything else that needs that token to block as above, and itself blocks until a valid token is returned.

Ahhh... I see. That sounds like a much better plan. It also keeps the fact that the token is "mutable" confined to a single thread, the dispatcher thread. One thing I'd have to be careful to handle is iIf multiple worker threads find that the token is now invalid. In that case they would all send the token renewal request to the dispatcher and they would each try and do the re-auth. To avoid that, part of the token renewal request could be the used token. When the dispatcher does the token renewal, it checks if the used token is the same as the current token. If so, it does the token renewal. If not, it dispatches to the SOAP worker thread with the new token.

...

Another advantage of this is that adding more workers is done in an obvious place (the queuer thread).

That said, I still don't know enough details to say if that token management implementation actually makes sense. I'd still go with the worker/queuer thread setup, which is I think what you were pointed to (the pool hackage).

Ok, I still don't see how the Pool package helps. I had assumed the pooled resource would be the token, so the size of the pool would be 1. Now that I rethink it, it seems like you are suggesting the resource is the worker threads. Is that right? When I read what you were talking about above with the dispatcher thread I had thought it would use forkIO to start the worker threads. The worker threads would read from a Chan that the dispatcher wrote to. In that way, the dispatcher doesn't have to worry about worker threads being available. Am I misunderstanding something? Thanks, Rich

...

-- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Show replies by date

Brandon Allbery

10 Jul 10 Jul

11:38 p.m.

New subject: External system connections

On Sun, Jul 10, 2011 at 19:16, Richard Wallace wrote:

...

On Sun, Jul 10, 2011 at 2:54 PM, Brandon Allbery wrote: One thing I'd have to be careful to handle is iIf multiple worker threads find that the token is now invalid. In that case they would all send the token renewal request to the dispatcher and they would each try and do the re-auth. To avoid that, part of the token renewal

One of the reasons to send the request back to the dispatcher instead of doing it inline is so that the dispatcher can note that a renewal request is already in flight (which it needs to know anyway, so it can block other requests) and wake all threads waiting on it when it's done, instead of having multiple renewals in flight.

...

1. Now that I rethink it, it seems like you are suggesting the resource is the worker threads. Is that right? When I read what you were talking about above with the dispatcher thread I had thought it would use forkIO to start the worker threads. The worker threads would read from a Chan that the dispatcher wrote to. In that way, the dispatcher doesn't have to worry about worker threads being available. Am I misunderstanding something?

The point of a pool is so (a) you can throttle it in special cases, such as when you need to renew the token, and (b) so you don't find yourself juggling a couple thousand threads if things get unexpectedly busy (or buggy). You can limit the pool to something sensible (probably something like 4 during development and debugging so things can't get too out of hand) and increased later; plus, the pool manager will provide the primitives to deal with managing shared resources (such as your token) within the pool. -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Richard Wallace

11 Jul 11 Jul

12:19 a.m.

New subject: External system connections

On Sun, Jul 10, 2011 at 4:38 PM, Brandon Allbery wrote:

...

One of the reasons to send the request back to the dispatcher instead of doing it inline is so that the dispatcher can note that a renewal request is already in flight (which it needs to know anyway, so it can block other requests) and wake all threads waiting on it when it's done, instead of having multiple renewals in flight.

Ok, I can see that. Though I was thinking that the worker threads would send the request back to the dispatcher by way of the same Chan the dispatcher reads requests from. Obviously I was thinking the dispatcher would use the original token to filter requests in the Chan. If I understand what you are talking about, the dispatcher would do the token renewal in a separate thread, continuing to process it's incoming Chan while that is going on, accumulating incoming requests somewhere, perhaps in another Chan. Then, when the token renewal is complete, the dispatcher stops forwarding incoming requests to the secondary Chan, processes the accumulated requests using the new token, and, when done with those, the switches back to processing the incoming Chan. Is that something like what you had in mind, or did I make it more complicated than necessary?

...

The point of a pool is so (a) you can throttle it in special cases, such as when you need to renew the token, and (b) so you don't find yourself juggling a couple thousand threads if things get unexpectedly busy (or buggy). You can limit the pool to something sensible (probably something like 4 during development and debugging so things can't get too out of hand) and increased later; plus, the pool manager will provide the primitives to deal with managing shared resources (such as your token) within the pool.

Hmm. I'm still not entirely seeing it and I think the problem is just my lack of knowledge of Haskell concurrency. If the pool is of threads, how do you start the threads? How do you submit work to the threads? The only way I know of in Haskell of creating threads to do work is forkIO. That takes a function and runs to completion. Would a worker thread just be one that loops forever and checks a MVar for something to do? So the pool would really consist of (MVar, ThreadId) pairs. When we speak of getting a thread out of a pool and giving it work to do, are we really talking about sticking a value in an MVar that the thread is blocking on, waiting for data to be available to do something with, so that it can go and do some work? And when we are done with the thread we return it to the pool? Rich P.S. Thanks for talking me through this! I'm learning a ton about concurrency in Haskell.

...

-- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Brandon Allbery

1:46 a.m.

New subject: External system connections

On Sun, Jul 10, 2011 at 20:19, Richard Wallace wrote:

...

On Sun, Jul 10, 2011 at 4:38 PM, Brandon Allbery wrote: Ok, I can see that. Though I was thinking that the worker threads would send the request back to the dispatcher by way of the same Chan the dispatcher reads requests from. Obviously I was thinking the dispatcher would use the original token to filter requests in the Chan. If I understand what you are talking about, the dispatcher would do the token renewal in a separate thread, continuing to process

The same Chan is used to send a request; the thread processing token renewal might or might not be otherwise a normal worker thread, but it's separated out as a Maybe ThreadId instead of being in a pool, because (a) there can only be zero or one of them, and (b) if it's not Nothing then the dispatcher thread accepts only token renewals. (This actually requires either multiple Chans or something more complex than a normal Chan, since you can't filter a Chan based on the type of message.) You also need some way to block the sender, which suggests that a message written down a Chan must include an MVar which will be signaled when the operation is complete. This suggests to me something along the lines of

...

data WorkRequest = SOAPData ... (MVar Bool) | TokenRequest Token (MVar Token) -- | ReadyForWork (MVar WorkRequest)

where the requestor allocates an MVar, writes it as part of the WorkRequest, and then does a takeMVar to wait for the response. The dispatcher reads WorkRequests, dispatches any it can to available workers, and queues the rest internally; if it's a TokenRequest then it's queued separately and all SOAPData requests get queued regardless of whether there are free workers. When the single token processor returns, all entries in the TokenRequest queue get awakened (putMVar threadMVar newToken) and normal processing of the standard request queue resumes. Or you can see if the pool hackage handles the ugly details here automatically; I haven't looked.

...

If the pool is of threads, how do you start the threads? How do you submit work to the threads? The only way I know of in Haskell of creating threads to do work is forkIO. That takes a function and runs to completion. Would a worker thread just be one that loops forever

Yes; the dispatcher keeps a list of workers, which are forkIO-d threads that are waiting on an MVar or Chan for work to do. When they receive something, they go off and do it, write the result into another MVar or Chan which was specified in the request, and go back to waiting on the initial MVar/Chan for something to do. If the list is shorter than the maximum, more workers are forkIO-d to fill it as needed; if longer, idle workers are sent "shut down" requests. (The latter is "polite" handling of program shutdown, and also allows for the pool size to be modified dynamically if needed.) I think doing this right also requires that a worker that's ready for more work explicitly check in, so the dispatcher knows it's available; that could be handled by an additional WorkRequest type (see commented-out line above, where a worker that's ready to handle another request passes its input MVar to the dispatcher)... but there may be better ways; I have some grasp of concurrency, but my Haskell library fu is still somewhat weak. Hopefully someone else will jump in if appropriate. (You can see how quickly this becomes complex, though; if the canned solution does what you need, you might want to avoid reinventing this particular wheel unless you're doing it for educational purposes.) -- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

Richard Wallace

2:42 a.m.

New subject: External system connections

Alright, I'll have to think on this some more but I think we're speaking the same language now - and what's more I even understand it! Thanks again for all your help, Rich On Sun, Jul 10, 2011 at 6:46 PM, Brandon Allbery wrote:

...

On Sun, Jul 10, 2011 at 20:19, Richard Wallace wrote:

...
On Sun, Jul 10, 2011 at 4:38 PM, Brandon Allbery wrote: Ok, I can see that. Though I was thinking that the worker threads would send the request back to the dispatcher by way of the same Chan the dispatcher reads requests from. Obviously I was thinking the dispatcher would use the original token to filter requests in the Chan. If I understand what you are talking about, the dispatcher would do the token renewal in a separate thread, continuing to process

The same Chan is used to send a request; the thread processing token renewal might or might not be otherwise a normal worker thread, but it's separated out as a Maybe ThreadId instead of being in a pool, because (a) there can only be zero or one of them, and (b) if it's not Nothing then the dispatcher thread accepts only token renewals. (This actually requires either multiple Chans or something more complex than a normal Chan, since you can't filter a Chan based on the type of message.) You also need some way to block the sender, which suggests that a message written down a Chan must include an MVar which will be signaled when the operation is complete. This suggests to me something along the lines of

...
data WorkRequest = SOAPData ... (MVar Bool) | TokenRequest Token (MVar Token) -- | ReadyForWork (MVar WorkRequest)

where the requestor allocates an MVar, writes it as part of the WorkRequest, and then does a takeMVar to wait for the response. The dispatcher reads WorkRequests, dispatches any it can to available workers, and queues the rest internally; if it's a TokenRequest then it's queued separately and all SOAPData requests get queued regardless of whether there are free workers. When the single token processor returns, all entries in the TokenRequest queue get awakened (putMVar threadMVar newToken) and normal processing of the standard request queue resumes.

Or you can see if the pool hackage handles the ugly details here automatically; I haven't looked.

...
If the pool is of threads, how do you start the threads? How do you submit work to the threads? The only way I know of in Haskell of creating threads to do work is forkIO. That takes a function and runs to completion. Would a worker thread just be one that loops forever

Yes; the dispatcher keeps a list of workers, which are forkIO-d threads that are waiting on an MVar or Chan for work to do. When they receive something, they go off and do it, write the result into another MVar or Chan which was specified in the request, and go back to waiting on the initial MVar/Chan for something to do. If the list is shorter than the maximum, more workers are forkIO-d to fill it as needed; if longer, idle workers are sent "shut down" requests. (The latter is "polite" handling of program shutdown, and also allows for the pool size to be modified dynamically if needed.) I think doing this right also requires that a worker that's ready for more work explicitly check in, so the dispatcher knows it's available; that could be handled by an additional WorkRequest type (see commented-out line above, where a worker that's ready to handle another request passes its input MVar to the dispatcher)... but there may be better ways; I have some grasp of concurrency, but my Haskell library fu is still somewhat weak. Hopefully someone else will jump in if appropriate.

(You can see how quickly this becomes complex, though; if the canned solution does what you need, you might want to avoid reinventing this particular wheel unless you're doing it for educational purposes.)

-- brandon s allbery allbery.b@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms

5110

Age (days ago)

5111

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Brandon Allbery
Richard Wallace