Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jetty-dev] New Scheduling idea for 9.3




On 31 July 2014 04:39, Simone Bordet <sbordet@xxxxxxxxxxx> wrote:


It's unclear to me what you mean by "parse until just the next request".

I mean parse until the next dispatch point.  So that could involve parsing over data frames, which will be added to the appropriate queue.

in reality I expect to see either batches of requests (all header frames) or singular uploads (header + data* )

 
If that request is the only one and uploads a big content ?
We typically dispatch to handle the request as soon as the parser has
parsed a HEADERS.
However, the buffer may contain another HEADERS, or half of it, or a
DATA frame or half of it, etc.
So if it is all DATA frames, we don't handle until we have them all ?
Or you mean we parse until end of HEADERS ? And if the buffer has more bytes ?

yes and yes
 

Let me try to rewrite this scheme with my understanding, to see if it
matches yours.

hmmm hard to tell - close, but different level of detail makes it difficult to tell if exactly the same idea.   How about the code sample I sent... does that make sense to you?
 
Every major number change is a dispatch.

1.0. selector I/O, dispatch
2.0. read into buffer
2.1. CAS guard: if a thread is executing 2.1-2.6 steps, return.
2.2. parse buffer.
2.3. find HEADERS.
2.4. if buffer empty, dispatch to 2.0; else buffer not empty, wrap the
buffer into Runnable and dispatch to 2.1
2.5. handle request, call app.
2.6. app returns, goto 2.1
3.0 (from 2.4 dispatching to 2.0) read into buffer
3.1. CAS guard
3.2. (from 2.4 dispatching to 2.1) parse buffer
3.3. find DATA
3.4. queue data
3.5. if buffer empty goto 3.0. else goto 3.1
4.0 (from 2.4 dispatching to 2.0) read into buffer
4.1. CAS guard
4.2. (from 2.4 dispatching to 2.1) parse buffer
4.3. find HEADERS
4.4, 4.5, 4.6 like 2.4, 2.5, 2.6.
5.0. (from 2.4 dispatching to 2) read into buffer
5.1. CAS guard
5.2. (from 2.4 dispatching to 2.1) parse buffer
5.3. find not enough bytes for a frame, goto 5.0

Now like you say there is a race between thread 2 arriving at 2.1 from
2.6, and thread 3 arriving to 2.1 from 2.4.
That dispatch happens anyway though, even if it is a single HEADERS
with no DATA (or the buffer is empty).
So the pressure on the thread pool and on the OS scheduling is the
same, even if the thread will end up just failing a CAS and returning.
We do have locality though.

Indeed - although we can probably optimise to only ever have a single task outstanding at 2.4.  Ie if we have dispatched previous at 2.4, but a thread has not yet arrived at 2.1, then don't dispatch another.
 

Note that we cannot "delay" the dispatch at 2.4 hoping that 2.6
happens: we need to dispatch anyway.

correct.   You can't call 2.5 without making arrangements for the rest of the buffer to be processed.

If we have N HEADERS frames one after the other, we will have N-1
dispatches for Ys (my 2.4).

In the worst case yes.  I think we can sometimes get < N
 

With this scheme, if I followed you right, do we still need to
dispatch from the selector ?

yes - even more so as any thread dispatched to 2.1 may end up in application code.
 

Overall, I think we will loose some locality in the parsing, which
means the new thread will have to load the parser, the connection, the
buffer, etc. references, but we gain locality in the request handling.
 
exactly.   I think there is a lot less state in the parser, specially between headers.  It should simply be a state variable and a pointer into the buffer.

Number of dispatches is the same.

or less

 
We'll need to measure carefully, but I think it's worthwhile to have
the flexibility even if there is no gain, in case something changes,
hardware-wise, in the future.

and that is the hard part.   I've started writing a test harness, but near impossible to make it realistic load.
It may simply be best to structure the code with and explicitly pluggable strategy (along the lines of the code I posted... but that is not yet exactly right in either form nor implementation), so we can then test real servers with real loads.        So perhaps the next step is to refactor the code into the task producing model I've hinted at and then we can try multiple strategies over time.

It would be great if we had jetty-users with significant traffic that were interested in collaborating to refine this http/2 approach.

cheers








 

Back to the top