Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Re: multithreaded cache? (http://www.velocityreviews.com/forums/t946585-re-multithreaded-cache.html)

Kevin McMurtrie 05-25-2012 06:22 AM

Re: multithreaded cache?
 
In article <zsWdnVufD87Egy_SnZ2dnUVZ7oednZ2d@brightview.co.uk >,
bugbear <bugbear@trim_papermule.co.uk_trim> wrote:

> I'm using various Apache Commons Maps as a
> multithread cache, protected using
> ReentrantReadWriteLock, so that getting() uses a read lock,
> and putting() uses a write lock.
>
> But I've got an issue; in the
> case of a cache miss (protected by a read lock),
> the required value is acquired using the "underlying function"
> that the cache is over; this value is then put() into
> the cache (protected by a write lock)
>
> This is all perfectly thread safe, and gives
> correct results.
>
> However, if the underlying function is slow
> and/or resource hungry (consider cacheing
> a ray traced image!) many threads can
> end up calling the real function (second
> and subsequent threads to the first get a miss
> during the first threads call to the underlying function).
>
> "obviously..." what I want is for only
> the thread that FIRST has a cache miss
> calls the underlying function, whilst other
> threads (for the same key) wait.
>
> This seems "obvious" enough that I'm guessing
> there's a standard solution.
>
> Googling led me to several "heavy" libraries;
>
> This appears more a locking/cacheing issue
> than a Java issue, although my implementation
> is in Java.
>
> Can anyone (please) point me at a
> canonical solution/implementation?
>
> BugBear


Ha, this is an interview question that I use.

What you need is row level locking for the cache load.

Step 1)
Use synchronized operations to map your key to a value; creating an
uninitialized value in the map if needed. Use whatever tech you want.
A synchronized block on a HashMap is simplest and performs the fastest
on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
with 4+ core systems.

Step 2)
Synchronize on the value. Initialize it if needed.


Step 1 blocks all cache access for only for a very short moment to make
sure that a key always has a value. Step 2 blocks access independently
for each cache value to make sure that it is loaded. It will perform
well for continuous use by several CPU cores. Google has some high
concurrency Maps that aren't too bad either.

In the 16 core range you'll find that any kind of exclusive lock causes
stalls where threads suspend while holding locks, causing a backlog that
reinforces itself. A concurrency expert can fix that using complex
Compare-And-Swap designs. In the 32+ core range you'll sometimes find
that harmless race conditions are better than multiple threads
frequently writing to the same memory page.
--
I will not see posts from Google because I must filter them as spam

Robert Klemme 05-25-2012 03:36 PM

Re: multithreaded cache?
 
On 25.05.2012 08:22, Kevin McMurtrie wrote:
> Ha, this is an interview question that I use.
>
> What you need is row level locking for the cache load.


But not all the time. See https://gist.github.com/2717818

> Step 1)
> Use synchronized operations to map your key to a value; creating an
> uninitialized value in the map if needed. Use whatever tech you want.
> A synchronized block on a HashMap is simplest and performs the fastest
> on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
> with 4+ core systems.


In my experience a CHM performs better even on a 1 or 2 core system.

> Step 2)
> Synchronize on the value. Initialize it if needed.
>
>
> Step 1 blocks all cache access for only for a very short moment to make
> sure that a key always has a value. Step 2 blocks access independently
> for each cache value to make sure that it is loaded. It will perform
> well for continuous use by several CPU cores. Google has some high
> concurrency Maps that aren't too bad either.


Actually once the value is in the cache you do not need any step 2
synchronization any more.

> In the 16 core range you'll find that any kind of exclusive lock causes
> stalls where threads suspend while holding locks, causing a backlog that
> reinforces itself. A concurrency expert can fix that using complex
> Compare-And-Swap designs.


Basically this is what CHM does with putIfAbsent() internally.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Kevin McMurtrie 06-02-2012 07:27 PM

Re: multithreaded cache?
 
In article <a29n7cFr61U1@mid.individual.net>,
Robert Klemme <shortcutter@googlemail.com> wrote:

> On 25.05.2012 08:22, Kevin McMurtrie wrote:
> > Ha, this is an interview question that I use.
> >
> > What you need is row level locking for the cache load.

>
> But not all the time. See https://gist.github.com/2717818


The original post stated that duplicate element creation was too
expensive. That code may create duplicate elements for a key and
discard the extras.


>
> > Step 1)
> > Use synchronized operations to map your key to a value; creating an
> > uninitialized value in the map if needed. Use whatever tech you want.
> > A synchronized block on a HashMap is simplest and performs the fastest
> > on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
> > with 4+ core systems.

>
> In my experience a CHM performs better even on a 1 or 2 core system.


This depends on usage patterns. The Java 1.5 Concurrency locks used in
some parts of CHM are very slow compared to synchronization. The
Concurrency locks only have a chance of performing better when there are
a lot of concurrent shared read locks.


> > Step 2)
> > Synchronize on the value. Initialize it if needed.
> >
> >
> > Step 1 blocks all cache access for only for a very short moment to make
> > sure that a key always has a value. Step 2 blocks access independently
> > for each cache value to make sure that it is loaded. It will perform
> > well for continuous use by several CPU cores. Google has some high
> > concurrency Maps that aren't too bad either.

>
> Actually once the value is in the cache you do not need any step 2
> synchronization any more.


Step 2 is initializing the value. It's critical to synchronize there or
the second thread to request the same key may see an element that is not
yet fully initialized. (Again, the original post stated that elements
were expensive to create.)

>
> > In the 16 core range you'll find that any kind of exclusive lock causes
> > stalls where threads suspend while holding locks, causing a backlog that
> > reinforces itself. A concurrency expert can fix that using complex
> > Compare-And-Swap designs.

>
> Basically this is what CHM does with putIfAbsent() internally.


Yes, but you don't have access to its internal container class so CHM is
not much use for caching elements that are very expensive to create.
The Googlely version does provide a way to initialize the container with
a factory callback; combining steps 1 and 2 here.

>
> Kind regards
>
> robert

--
I will not see posts from Google because I must filter them as spam

Robert Klemme 06-02-2012 07:54 PM

Re: multithreaded cache?
 
On 02.06.2012 21:27, Kevin McMurtrie wrote:
> In article<a29n7cFr61U1@mid.individual.net>,
> Robert Klemme<shortcutter@googlemail.com> wrote:
>
>> On 25.05.2012 08:22, Kevin McMurtrie wrote:
>>> Ha, this is an interview question that I use.
>>>
>>> What you need is row level locking for the cache load.

>>
>> But not all the time. See https://gist.github.com/2717818

>
> The original post stated that duplicate element creation was too
> expensive. That code may create duplicate elements for a key and
> discard the extras.


No, it does not create duplicate elements and hence does not discard
them. The only thing which might be duplicated is the proxy instance
which calls the factory method (created in line 98). But that is cheap.
The design ensures that a value is only calculated once per key and
neither CPU is wasted nor duplicate value objects. Please see Lew's
excellent explanation of what happens or look at the code again.

>>> Step 1)
>>> Use synchronized operations to map your key to a value; creating an
>>> uninitialized value in the map if needed. Use whatever tech you want.
>>> A synchronized block on a HashMap is simplest and performs the fastest
>>> on 1 or 2 core systems. A ConcurrentHashMap sometimes performs better
>>> with 4+ core systems.

>>
>> In my experience a CHM performs better even on a 1 or 2 core system.

>
> This depends on usage patterns. The Java 1.5 Concurrency locks used in
> some parts of CHM are very slow compared to synchronization. The
> Concurrency locks only have a chance of performing better when there are
> a lot of concurrent shared read locks.


Even if they are slower than synchronization they perform better than a
single lock on the whole Map (what you suggested above) with multiple
threads even on a few core machine just because there are more of them
and the whole Map is partitioned. I haven't strictly measured the
synchronization mechanisms used inside CHM but I did also not notice bad
timing (and frankly, I cannot believe Doug Lea would have used
inefficient mechanisms). Bottom line: when in doubt measure.

>>> Step 2)
>>> Synchronize on the value. Initialize it if needed.
>>>
>>>
>>> Step 1 blocks all cache access for only for a very short moment to make
>>> sure that a key always has a value. Step 2 blocks access independently
>>> for each cache value to make sure that it is loaded. It will perform
>>> well for continuous use by several CPU cores. Google has some high
>>> concurrency Maps that aren't too bad either.

>>
>> Actually once the value is in the cache you do not need any step 2
>> synchronization any more.

>
> Step 2 is initializing the value. It's critical to synchronize there or
> the second thread to request the same key may see an element that is not
> yet fully initialized. (Again, the original post stated that elements
> were expensive to create.)


Maybe my wording was not clear enough. Once the value is in the Map
synchronization is needed no more for this key and only the Map level
locking remains. Again, please look at the code.

>>> In the 16 core range you'll find that any kind of exclusive lock causes
>>> stalls where threads suspend while holding locks, causing a backlog that
>>> reinforces itself. A concurrency expert can fix that using complex
>>> Compare-And-Swap designs.

>>
>> Basically this is what CHM does with putIfAbsent() internally.

>
> Yes, but you don't have access to its internal container class so CHM is
> not much use for caching elements that are very expensive to create.


I beg to differ (see code).

> The Googlely version does provide a way to initialize the container with
> a factory callback; combining steps 1 and 2 here.


Which version? Please share a reference. Thank you!

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/


All times are GMT. The time now is 06:26 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.