Mark M wrote:
> We have large in-memory objects that cannot be recreated from the
> original source. They can however, be persisted to disk and
> reconstructed, but it is very expensive to do so.
>
> We need to have the JVM collect these objects when memory runs low,
> but we don't want to pay the cost of putting them to disk unless it is
> necessary.
You are right that Java's weak/etc references are not of direct help in solving
this problem.
I think you need to approach it from a different direction. Here's how I see
it:
I'm assuming that the objects you talk about really are (more or less) single
objects from the POV of the users of those objects -- rather than being
presented as complex webs of objects. If that's not true then I'm pretty sure
you'll have to redesign so that it *is* true
So, you have a class of objects called BigObject which contains within it some
large amount of data which can be persisted to file. So, for simplicity, we
have a BigObject containing fields m_data and m_filename -- m_filename is set
to the name of the file where the data has been persisted (if it has been) and
m_data is either set to a large data object or to null.
You can tell a BigObject to purge() itself which will write the data to file
(if it hasn't been already) and set the m_data to null. You can also tell it
to restore() itself, which will read the data from file and assign it to m_data
(if it isn't restored already). The question is, when to do these things.
[BTW, I'm sorry if I'm going into more detail than you want or need, but the
replies I've seen so far seem to me to miss the point, so I thought it might be
better to take this in small steps.]
The use of soft references won't give you the features that you need -- that
isn't what it was designed to do. So you need to use a different technique.
That technique is necessarily going to be a heuristic. Ideally it'll be one
that you can easily monitor and control (and tweak).
You will need to keep track of all the referenced BigObjects. Using a WeakSet
(or similar) as a static member of the class will do that. BigObjects are
added to that set as part of their constructor, and are removed from it by the
system.
When a new BigObject is created, the class checks the list of already existing
BigObjects and if the number exceeds a certain threshold (or you could use
their total size for a better estimate) it will tell some of them to purge()
themselves. You will have to set the threshold by experimentation or analysis
or luck.
You might also (depending on the structure of your application) be able to trap
out-of-memory events and use them to cause some BigObjects to purge()
themselves. Possible you could try to detect such events "early" by
temporarily creating a large byte[] array. That could improve the accuracy of
the heuristic, but I don't think that it could be made reliable enough, or
convenient enough, to replace the use of a limit on the space taken by existing
BigObjects (not unless your app has a very specific structure, anyway).
How to choose which BigObjects to purge() ? If these cases are rare (as I'd
guess is likely) then you could probably get away with just unconditionally
purge()-ing all the pre-existing BigObjects. Or you could choose some to
sacrifice randomly. OTOH, you might maintain a least recently used list (LRU)
of BigObjects, and purge() the ones that have gone "idle". That is, of course,
quite a bit more work since (a) you'd have to make each operation on a
BigObject update the LRU list, and (b) you'd have to implement the LRU list as
a weak collection of some kind. Perhaps it would be easier just to keep a
timestamp in each BigObject and purge() the oldest ones.
How to get BigObjects to restore() themselves ? Two patterns suggest
themselves. One is for each BigObject to restore() itself automatically before
each use (a nullop if it hasn't been purge()ed). That runs the risk that the
system will "thrash" with BigObjects being purge()ed only to restore()
themselves almost instantly afterwards (that's an inherent risk in any such
system -- it would apply even if you could use soft references -- but at least
you are in control of the run-time parameters and can easily monitor/modify
what's going on.) The other pattern would be to have a explicit
lock()/unlock() calls that the client code is required to call before/after
each block of operations. The lock() call would lock the BigObject into memory
(make it un-purge()-able), the unlock call would make it eligible for
purge()ing. The problem with that is that there's a maintenance headache in
ensuring that lock() and unlock() were always called correctly (you could use
finalisation as a debugging aid to catch cases where BigObjects that died while
still locked). Note that using explicit lock()/unlock() protocol would
simplify the LRU implementation (if you used one) because it would only be the
unlock() call that updated the LRU list.
Of course, there are many ways of setting up such a system. I'd be inclined to
use small Handle objects as the public face of the BigObjects, each of which
would have a reference to either a real BigObject (memory resident) or a
PersistedBigObject (basically just the name of the file where the data was
written). I think that would better separate the logic of the BigObject's real
role(s) from the independent logic of controlling their resource usage.
There'd also be issues with thread-safety, ensuring that persisted files got
cleaned up, tools for monitoring the behaviour of the system, etc. But that's
just code...
-- chris