Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > NIO multiplexing + thread pooling

Reply
Thread Tools

NIO multiplexing + thread pooling

 
 
markspace
Guest
Posts: n/a
 
      09-28-2011
On 9/28/2011 9:56 AM, Robert Klemme wrote:

> It all depends of course how it's done. But imagine 10,000 open channels
> with traffic, 1,000 reader threads and a single thread doing selecting
> and dispatching to reader threads.



How much traffic? One single 100 gigabit port running all out?

I think this scenario would depend on the number of packets received,
not even actual traffic. 100 1 byte packets will require a lot more
attention than 1 100 byte packet. Packet reception should be where the
bottleneck is.

I think you'd have to profile an actual application to determine where
the time spend really is. I'd like to see what someone with 10k open
ports is actually doing, and what sort of hardware they're running on.


> If you do not partition (and this is
> the case I was talking about) you probably put something into a single
> queue from which those 1,000 reader threads read and the selector thread
> is hammering it all the time from the other end.



There's only one physical network controller. Running flat out and with
an average of 100 byte length packets (something like a typical HTTP GET
request), I estimated that the network controller would have to write
out one packet to memory every 30 nano seconds.

I'm not sure that's feasible.

Wikipedia indicates that DDR2 memory has a bandwidth of around 12 GB/s.
That's on the order of 100 Gb ethernet. But "on the order of" means
there's no bandwidth left for any other device, including the CPU or
hard disc.

>
> Suggest a synchronization mechanism for the use case I described above
> (1 selector, 10,000 channels, 1,000 readers) and then we can talk about
> properties of that.



Show me an actual app first, I'm not sure what you've proposed is
actually slightly realistic.

1 selector = 1 physical network device. I'm not sure banging on one
device with 10 threads is somehow better than just letting one thread
interface with the device.

 
Reply With Quote
 
 
 
 
markspace
Guest
Posts: n/a
 
      09-29-2011
On 9/28/2011 5:39 PM, Peter Duniho wrote:
> I don't have specific numbers. But I can tell you that synchronization
> overhead, and in particular the cost of a thread context switch, is one
> of the reasons that i/o completion ports on Windows is such a critical
> technique for scalable networking processes.



Interesting....

<http://stackoverflow.com/questions/2794535/linux-and-i-o-completion-ports>


 
Reply With Quote
 
 
 
 
Lew
Guest
Posts: n/a
 
      09-30-2011
markspace wrote:
> Peter Duniho wrote:
>> I don't have specific numbers. But I can tell you that synchronization
>> overhead, and in particular the cost of a thread context switch, is one
>> of the reasons that i/o completion ports on Windows is such a critical
>> technique for scalable networking processes.

>
> Interesting....
>
> <http://stackoverflow.com/questions/2794535/linux-and-i-o-completion-ports>


Context switching is a different issue from synchronization. Context switches happen even in non-critical sections, where synchronization does not apply.

This does not invalidate the rest of Pete's points since they emanate from the cost of context switching, which is actually more frequent and a more impactful phenomenon than synchronization.

So in the simple-minded case where a separate single thread handles each connection there is possibly no synchronization at all between the threads, but context switching still militates against this technique for too many concurrent connections.

OTOH, choice of operating system and hardware can influence this equation. For example, the QNX operating system on Intel/AMD is famously fast at context switches.

--
Lew
 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      09-30-2011
On 9/30/2011 7:46 AM, Lew wrote:
> Context switching is a different issue from synchronization. Context
> switches happen even in non-critical sections, where synchronization
> does not apply.



Thanks for pointed that out, since it is indeed true. Not all
synchronization requires a context switch, or even an OS call. Not all
multi-threading requires context switching either.


> This does not invalidate the rest of Pete's points since they emanate
> from the cost of context switching, which is actually more frequent
> and a more impactful phenomenon than synchronization.



What I thought most interesting about that stackoverflow page was the
assertion that different methods work optimally on different systems.

That is, IOCP works well on Windows because the underlying system has
been optimized for IOCP. Whereas Linux "notification" (selectors, I
think) work well on *nix because that path has been optimized for *nix
systems.

I guess my point was "test what you have, not what the other guys says
works." Just because Java doesn't use IOCP, or does use selectors,
doesn't mean it will/won't run well on your target system. There's lots
of ways of doing things "optimaly," but you always gotta test to verify
it's working the way you think it is.

 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      09-30-2011
markspace wrote:
> Not all multi-threading requires context switching either.


Huh?

--
Lew
 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      09-30-2011
On 9/30/2011 8:30 AM, Lew wrote:
> markspace wrote:
>> Not all multi-threading requires context switching either.

>
> Huh?
>



Imagine a situation where the number of threads is matched to the number
of CPUs. No need to switch out a context there (though it might happen
anyway, there isn't a *need*). Just pass data back and forth. Memory
barriers (happens-before in Java) is all you need in this case.


 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      09-30-2011
On 09/30/2011 05:49 PM, markspace wrote:
> On 9/30/2011 8:30 AM, Lew wrote:
>> markspace wrote:
>>> Not all multi-threading requires context switching either.

>>
>> Huh?

>
> Imagine a situation where the number of threads is matched to the number
> of CPUs. No need to switch out a context there (though it might happen
> anyway, there isn't a *need*). Just pass data back and forth. Memory
> barriers (happens-before in Java) is all you need in this case.


That sounds fairly theoretical. On one hand a system typically has more
processes (let alone threads) than cores. On the other hand even if the
number of cores is large enough you will have management overhead
through the OS's scheduler.

Kind regards

robert
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
I/O Multiplexing and non blocking socket Salvatore Di Fazio Python 4 12-01-2006 10:07 PM
Re: I/O Multiplexing and non blocking socket Jean-Paul Calderone Python 0 12-01-2006 02:30 PM
Setting E1 VWIC cards to ATM mode for Inverse multiplexing over ATM(IMA) Ivan82 Cisco 1 08-29-2006 02:38 PM
NIO with timeouts != NIO? iksrazal Java 1 06-18-2004 02:28 PM
Re: CEP vs Inverse Multiplexing Andre Beck Cisco 0 11-21-2003 05:27 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57