Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   Data structure to keep things in memory (http://www.velocityreviews.com/forums/t738981-data-structure-to-keep-things-in-memory.html)

Mike Schilling 11-28-2010 08:09 AM

Data structure to keep things in memory
 
Without going into all the details, I have the following situation:

There are a set of objects RTS, all of the same type, that gather run-time
statistics about the running of a program. Various related objects with
different lifecycles pass them around and use them. I'm interested in when
each member of RTS is no longer being updated, because I want to collect its
final statistics , but there's no good way to tell when it's no longer in
use except via its collection, so when it's created I attach a weak
reference to it and associate the weak reference with a queue. When the
reference is delivered, I record the statistics. [1]

So far so good. The question is, what's the best way to keep the references
in memory until they can be delivered to the queue? Obviously, I could put
them into a synchronized HashSet, but I'd prefer to optimize for concurrency
by minimizing locking. At the moment, I'm using a ConcurrentHashMap, since
it seems to give the best concurrency for updates. (The usage pattern is a
bit unusual, since there are effectively no lookups. There's a put() when
the reference is created, and a remove() when it's pulled out of the
reference queue.) This is an area where I have very little experience, so
I'd welcome input from anyone who's worked with these classes.

1. Obviously, a member of RTS points to a separate statistics object which
is also pointed to by the weak reference.


ClassCastException 11-28-2010 09:07 AM

Re: Data structure to keep things in memory
 
On Sun, 28 Nov 2010 00:09:40 -0800, Mike Schilling wrote:

> Without going into all the details, I have the following situation:
>
> There are a set of objects RTS, all of the same type, that gather
> run-time statistics about the running of a program. Various related
> objects with different lifecycles pass them around and use them. I'm
> interested in when each member of RTS is no longer being updated,
> because I want to collect its final statistics , but there's no good way
> to tell when it's no longer in use except via its collection, so when
> it's created I attach a weak reference to it and associate the weak
> reference with a queue. When the reference is delivered, I record the
> statistics. [1]
>
> So far so good. The question is, what's the best way to keep the
> references in memory until they can be delivered to the queue?
> Obviously, I could put them into a synchronized HashSet, but I'd prefer
> to optimize for concurrency by minimizing locking. At the moment, I'm
> using a ConcurrentHashMap, since it seems to give the best concurrency
> for updates. (The usage pattern is a bit unusual, since there are
> effectively no lookups. There's a put() when the reference is created,
> and a remove() when it's pulled out of the reference queue.) This is an
> area where I have very little experience, so I'd welcome input from
> anyone who's worked with these classes.
>
> 1. Obviously, a member of RTS points to a separate statistics object
> which is also pointed to by the weak reference.


The RTS object won't be collected, and so won't end up on the reference
queue, if it's held in a hashmap.

You need to do something a little bit more complicated: have an RTS
object with the actual statistics, plus an RTSHandle object for each RTS
object that holds the RTS object with an ordinary (strong) reference,
plus a Map<WeakReference<RTSHandle>,RTS> from WeakReferences to the RTS
objects. When a new RTS is created, the factory method creates an
RTSHandle as well, and a WeakReference on that handle, and puts a mapping
from the WeakReference to the RTS into the map as well as returning the
handle. The rest of your program uses the RTSHandle to access the RTS and
add statistics, eventually losing all strong references to the RTSHandle.

When this happens, the RTSHandle becomes weakly reachable as the only
remaining reference to it is the WeakReference in the map. The
WeakReference is enqueued. The queue polling thread finds it and uses it
as a key in the map. Since WeakReference doesn't override Object's
identity-based equals and hashCode this works even with the referent GCd.
It accesses the RTS via the map, does its thing, and then removes the map
entry for that RTS, allowing the RTS and the WeakReference to be GCd.

Mike Schilling 11-28-2010 09:56 AM

Re: Data structure to keep things in memory
 


"ClassCastException" <zjkg3d9gj56@gmail.invalid> wrote in message
news:ict651$av9$1@news.eternal-september.org...
> On Sun, 28 Nov 2010 00:09:40 -0800, Mike Schilling wrote:
>
>> Without going into all the details, I have the following situation:
>>
>> There are a set of objects RTS, all of the same type, that gather
>> run-time statistics about the running of a program. Various related
>> objects with different lifecycles pass them around and use them. I'm
>> interested in when each member of RTS is no longer being updated,
>> because I want to collect its final statistics , but there's no good way
>> to tell when it's no longer in use except via its collection, so when
>> it's created I attach a weak reference to it and associate the weak
>> reference with a queue. When the reference is delivered, I record the
>> statistics. [1]
>>
>> So far so good. The question is, what's the best way to keep the
>> references in memory until they can be delivered to the queue?
>> Obviously, I could put them into a synchronized HashSet, but I'd prefer
>> to optimize for concurrency by minimizing locking. At the moment, I'm
>> using a ConcurrentHashMap, since it seems to give the best concurrency
>> for updates. (The usage pattern is a bit unusual, since there are
>> effectively no lookups. There's a put() when the reference is created,
>> and a remove() when it's pulled out of the reference queue.) This is an
>> area where I have very little experience, so I'd welcome input from
>> anyone who's worked with these classes.
>>
>> 1. Obviously, a member of RTS points to a separate statistics object
>> which is also pointed to by the weak reference.

>
> The RTS object won't be collected, and so won't end up on the reference
> queue, if it's held in a hashmap.


Isn’t it clear that it's the *references* that are in the Map? I had
thought so. The question is what sort of Map to use to optimize
concurrency.


ClassCastException 11-28-2010 10:16 AM

Re: Data structure to keep things in memory
 
On Sun, 28 Nov 2010 01:56:40 -0800, Mike Schilling wrote:

> "ClassCastException" <zjkg3d9gj56@gmail.invalid> wrote in message
> news:ict651$av9$1@news.eternal-september.org...
>> On Sun, 28 Nov 2010 00:09:40 -0800, Mike Schilling wrote:
>>
>>> Without going into all the details, I have the following situation:
>>>
>>> There are a set of objects RTS, all of the same type, that gather
>>> run-time statistics about the running of a program. Various related
>>> objects with different lifecycles pass them around and use them. I'm
>>> interested in when each member of RTS is no longer being updated,
>>> because I want to collect its final statistics , but there's no good
>>> way to tell when it's no longer in use except via its collection, so
>>> when it's created I attach a weak reference to it and associate the
>>> weak reference with a queue. When the reference is delivered, I
>>> record the statistics. [1]
>>>
>>> So far so good. The question is, what's the best way to keep the
>>> references in memory until they can be delivered to the queue?
>>> Obviously, I could put them into a synchronized HashSet, but I'd
>>> prefer to optimize for concurrency by minimizing locking. At the
>>> moment, I'm using a ConcurrentHashMap, since it seems to give the best
>>> concurrency for updates. (The usage pattern is a bit unusual, since
>>> there are effectively no lookups. There's a put() when the reference
>>> is created, and a remove() when it's pulled out of the reference
>>> queue.) This is an area where I have very little experience, so I'd
>>> welcome input from anyone who's worked with these classes.
>>>
>>> 1. Obviously, a member of RTS points to a separate statistics object
>>> which is also pointed to by the weak reference.

>>
>> The RTS object won't be collected, and so won't end up on the reference
>> queue, if it's held in a hashmap.

>
> Isn’t it clear that it's the *references* that are in the Map?


Not from his post, it wasn't. He only mentioned RTSs and WeakReferences
in particular. If he just had RTSs and WeakReference<RTS>s, he had a
problem either way: if the RTSs are in the map the WeakReferences never
get enqueued, and if they're not, by the time one's enqueued the
associated RTS object is gone irretrievably. Using the proposed RTSHandle
allows to detect when the RTS's *other* references are gone before losing
the RTS.

> The question is what sort of Map to use to optimize concurrency.


ConcurrentHashMap seems like a no-brainer there to me, if there will be
many calls to the RTS factory from many threads. If not, a normal
(synchronized) HashMap will suffice. (The RTS factory does all the
putting in the scheme I outlined; whereas the lookups and removals are
all done on the single thread that consumes from the ReferenceQueue. The
critical section in the latter, with a plain synchronized map, would be
very brief, something like synchronized (map) { x = map.get(wr);
map.remove(wr); }. So it's the concurrency of access to the factory
that's the crux here.)

Steven Simpson 11-28-2010 11:34 AM

Re: Data structure to keep things in memory
 
On 28/11/10 09:07, ClassCastException wrote:
> You need to do something a little bit more complicated: have an RTS
> object with the actual statistics, plus an RTSHandle object for each RTS
> object that holds the RTS object with an ordinary (strong) reference,
> plus a Map<WeakReference<RTSHandle>,RTS> from WeakReferences to the RTS
> objects. When a new RTS is created, the factory method creates an
> RTSHandle as well, and a WeakReference on that handle, and puts a mapping
> from the WeakReference to the RTS into the map as well as returning the
> handle. The rest of your program uses the RTSHandle to access the RTS and
> add statistics, eventually losing all strong references to the RTSHandle.
>
> When this happens, the RTSHandle becomes weakly reachable as the only
> remaining reference to it is the WeakReference in the map. The
> WeakReference is enqueued. The queue polling thread finds it and uses it
> as a key in the map. Since WeakReference doesn't override Object's
> identity-based equals and hashCode this works even with the referent GCd.
> It accesses the RTS via the map, does its thing, and then removes the map
> entry for that RTS, allowing the RTS and the WeakReference to be GCd.


I've done something similar, except that I extended WeakReference so
that it contained a strong reference to the base object (RTS in this
case), rather than having the Map<Ref,RTS>. The WeakReference class
also implemented Runnable, so all the polling thread has to do is cast
the returned reference to Runnable and invoke it. In the OP's case, it
would look something like this:

// I'd make RTS an interface...
public interface RTS { ... }

// ...then make a (private) class implement it.
class RTSImpl implements RTS { ... }

// RTSHandle is just a proxy onto another RTS.
class RTSHandle implements RTS {
final RTS base;
RTSHandle(RTS base) { this.base = base; }
// all methods just call onto base
}

class RTSUser extends WeakReference<RTSHandle> implements Runnable {
final RTSImpl impl;

RTSUser(RTSHandle handle,
ReferenceQueue<? super RTSHandle> queue,
RTSImpl impl) {
super(handle, queue);
this.impl = impl;
}

void run() {
finishWith(impl);
}
}

ReferenceQueue<Object> queue = new ReferenceQueue<Object>();

// The factory method
public RTS getRTS() {
RTSImpl impl = new RTSImpl();
RTSHandle handle = new RTSHandle(impl);
RTSUser user = new RTSUser(handle, queue, impl);
return handle;
}

void finishWith(RTSImpl impl) {
...
}

// Call this in a loop.
void expireOneRef() throws InterruptedException {
Runnable action = (Runnable) queue.remove();
action.run(); // Calls finishWith, if (action instanceof RTSUser).
}



Lew 11-28-2010 01:55 PM

Re: Data structure to keep things in memory
 
ClassCastException wrote:
> ConcurrentHashMap seems like a no-brainer there to me, if there will be
> many calls to the RTS factory from many threads. If not, a normal
> (synchronized) HashMap will suffice. (The RTS factory does all the
> putting in the scheme I outlined; whereas the lookups and removals are
> all done on the single thread that consumes from the ReferenceQueue. The
> critical section in the latter, with a plain synchronized map, would be
> very brief, something like synchronized (map) { x = map.get(wr);
> map.remove(wr); }.


Since 'remove(wr)' returns the mapped value, you don't need the 'get()' call.

> So it's the concurrency of access to the factory that's the crux here.)


--
Lew

Mike Schilling 11-28-2010 06:12 PM

Re: Data structure to keep things in memory
 


"ClassCastException" <zjkg3d9gj56@gmail.invalid> wrote in message
news:icta6n$av9$2@news.eternal-september.org...
> On Sun, 28 Nov 2010 01:56:40 -0800, Mike Schilling wrote:
>
>> "ClassCastException" <zjkg3d9gj56@gmail.invalid> wrote in message
>> news:ict651$av9$1@news.eternal-september.org...
>>> On Sun, 28 Nov 2010 00:09:40 -0800, Mike Schilling wrote:
>>>
>>>> Without going into all the details, I have the following situation:
>>>>
>>>> There are a set of objects RTS, all of the same type, that gather
>>>> run-time statistics about the running of a program. Various related
>>>> objects with different lifecycles pass them around and use them. I'm
>>>> interested in when each member of RTS is no longer being updated,
>>>> because I want to collect its final statistics , but there's no good
>>>> way to tell when it's no longer in use except via its collection, so
>>>> when it's created I attach a weak reference to it and associate the
>>>> weak reference with a queue. When the reference is delivered, I
>>>> record the statistics. [1]
>>>>
>>>> So far so good. The question is, what's the best way to keep the
>>>> references in memory until they can be delivered to the queue?
>>>> Obviously, I could put them into a synchronized HashSet, but I'd
>>>> prefer to optimize for concurrency by minimizing locking. At the
>>>> moment, I'm using a ConcurrentHashMap, since it seems to give the best
>>>> concurrency for updates. (The usage pattern is a bit unusual, since
>>>> there are effectively no lookups. There's a put() when the reference
>>>> is created, and a remove() when it's pulled out of the reference
>>>> queue.) This is an area where I have very little experience, so I'd
>>>> welcome input from anyone who's worked with these classes.
>>>>
>>>> 1. Obviously, a member of RTS points to a separate statistics object
>>>> which is also pointed to by the weak reference.
>>>
>>> The RTS object won't be collected, and so won't end up on the reference
>>> queue, if it's held in a hashmap.

>>
>> Isn’t it clear that it's the *references* that are in the Map?

>
> Not from his post, it wasn't.


>>>> The question is, what's the best way to keep the
>>>> ***references*** in memory until they can be delivered to the queue?
>>>> Obviously, I could put them into a synchronized HashSet




Mike Schilling 11-28-2010 06:18 PM

Re: Data structure to keep things in memory
 


"Steven Simpson" <ss@domain.invalid> wrote in message
news:8br9s7-er7.ln1@news.simpsonst.f2s.com...
>
> I've done something similar, except that I extended WeakReference so
> that it contained a strong reference to the base object (RTS in this
> case), rather than having the Map<Ref,RTS>. The WeakReference class
> also implemented Runnable, so all the polling thread has to do is cast
> the returned reference to Runnable and invoke it. In the OP's case, it
> would look something like this:
>
> // I'd make RTS an interface...
> public interface RTS { ... }
>
> // ...then make a (private) class implement it.
> class RTSImpl implements RTS { ... }
>
> // RTSHandle is just a proxy onto another RTS.
> class RTSHandle implements RTS {
> final RTS base;
> RTSHandle(RTS base) { this.base = base; }
> // all methods just call onto base
> }
>
> class RTSUser extends WeakReference<RTSHandle> implements Runnable {
> final RTSImpl impl;
>
> RTSUser(RTSHandle handle,
> ReferenceQueue<? super RTSHandle> queue,
> RTSImpl impl) {
> super(handle, queue);
> this.impl = impl;
> }
>
> void run() {
> finishWith(impl);
> }
> }
>
> ReferenceQueue<Object> queue = new ReferenceQueue<Object>();
>
> // The factory method
> public RTS getRTS() {
> RTSImpl impl = new RTSImpl();
> RTSHandle handle = new RTSHandle(impl);
> RTSUser user = new RTSUser(handle, queue, impl);
> return handle;
> }
>
> void finishWith(RTSImpl impl) {
> ...
> }
>
> // Call this in a loop.
> void expireOneRef() throws InterruptedException {
> Runnable action = (Runnable) queue.remove();
> action.run(); // Calls finishWith, if (action instanceof RTSUser).
> }
>


How do you keep the RTSUser's from being GC'd before they are delivered to
the queue?


Steven Simpson 11-28-2010 08:11 PM

Re: Data structure to keep things in memory
 
On 28/11/10 18:18, Mike Schilling wrote:
> "Steven Simpson" <ss@domain.invalid> wrote in message
> news:8br9s7-er7.ln1@news.simpsonst.f2s.com...
>> class RTSUser extends WeakReference<RTSHandle> implements Runnable {
>> final RTSImpl impl;
>>
>> RTSUser(RTSHandle handle,
>> ReferenceQueue<? super RTSHandle> queue,
>> RTSImpl impl) {
>> super(handle, queue);
>> this.impl = impl;
>> }
>> RTSUser user = new RTSUser(handle, queue, impl);

>
> How do you keep the RTSUser's from being GC'd before they are
> delivered to the queue?


Hmm, I was under the impression that the ReferenceQueue kept track of
References, but I see it's the other way around (looking at the source).

So you need a container of RTSUsers. That was served by the
Map<Reference<RTSHandle>, RTSImpl> in ClassCastException's original
suggestion, and a Collection<RTSUser> will suffice here.

If you have to keep all your RTSImpls anyway, you could make each one
point to its RTSUser, so the container of RTSImpls indirectly hangs on
to the RTSUsers. I was originally thinking of a case where getRTS()
took some sort of key, and mapped it to an RTSImpl, which referenced its
(multiple) RTSUsers, so that map did the job implicitly.

Maybe you can get away with a ConcurrentMap<Key, RTSUser>, if you want
to avoid synchronization:

ConcurrentMap<Key, RTSUser> map;

RTS getRTS(Key key) {
// Create an implementation for 'key', even if we don't
// ultimately use it.
RTSImpl impl = createFor(key);

// Create a proxy for it.
RTSHandle proxy = proxify(impl);

// Create a weak reference to the proxy, so we can monitor its
// discard.
RTSUser user = new RTSUser(proxy, queue, impl, key);

for ( ; ; ) {
// Stick it in the map if there isn't one there already.
RTSUser oldUser = map.putIfAbsent(key, user);

// See if we can still get a strong reference to the proxy
// that the map does hold.
RTSHandle oldProxy = oldUser.get();

if (oldProxy == null) {
// No, we'll have to put ours in anyway, if the old
// one is still in there.
if (map.replace(key, oldUser, user))
// It was still there.
return proxy;
// It had gone (perhaps replaced, perhaps removed), so
// try again.
continue;
} else {
// Yes, we'll just use what's in the map, and discard
// our impl.
return oldProxy;
}
}
}

The RTSUser additionally should take the Key as well, and use
map.remove(key, this) to remove itself.

Mike Schilling 11-28-2010 08:43 PM

Re: Data structure to keep things in memory
 


"Steven Simpson" <ss@domain.invalid> wrote in message
news:5kpas7-paa.ln1@news.simpsonst.f2s.com...
> On 28/11/10 18:18, Mike Schilling wrote:
>> "Steven Simpson" <ss@domain.invalid> wrote in message
>> news:8br9s7-er7.ln1@news.simpsonst.f2s.com...
>>> class RTSUser extends WeakReference<RTSHandle> implements Runnable {
>>> final RTSImpl impl;
>>>
>>> RTSUser(RTSHandle handle,
>>> ReferenceQueue<? super RTSHandle> queue,
>>> RTSImpl impl) {
>>> super(handle, queue);
>>> this.impl = impl;
>>> }
>>> RTSUser user = new RTSUser(handle, queue, impl);

>>
>> How do you keep the RTSUser's from being GC'd before they are
>> delivered to the queue?

>
> Hmm, I was under the impression that the ReferenceQueue kept track of
> References, but I see it's the other way around (looking at the source).
>
> So you need a container of RTSUsers. That was served by the
> Map<Reference<RTSHandle>, RTSImpl> in ClassCastException's original
> suggestion, and a Collection<RTSUser> will suffice here.
>
> If you have to keep all your RTSImpls anyway, you could make each one
> point to its RTSUser, so the container of RTSImpls indirectly hangs on
> to the RTSUsers. I was originally thinking of a case where getRTS()
> took some sort of key, and mapped it to an RTSImpl, which referenced its
> (multiple) RTSUsers, so that map did the job implicitly.
>
> Maybe you can get away with a ConcurrentMap<Key, RTSUser>, if you want
> to avoid synchronization:


ConcurrentMap is what I'm using. The consensus seems to be that it's the
best choice.



All times are GMT. The time now is 09:54 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.