Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > efficient use of hashtable with string like keys

Reply
Thread Tools

efficient use of hashtable with string like keys

 
 
Harald Kirsch
Guest
Posts: n/a
 
      08-15-2003
My applications does a lot of text crunching, filtering
information out of the text. For reasons of efficiency
all the text handling is done in StringBuffer objects.

When the application finds something interesting, it wants
to store it in a Hashtable with the text to be used as a
key still in the StringBuffer.

In order to test whether the key is already known to the Hashtable
I currently create a String object from the StringBuffer. But
most of the time the key happens to be already known to the
Hashtable and the String created is immediately thrown away again.

How can I use a reusable object, e.g. a StringBuffer, as a key to
the Hashtable such that hashing is performed according .equals()
instead of according to object identity.

I know that it will be a disaster if I change a stored object later,
but maybe someone knows a clean way to get around the superflous
creation of a String object.

Thanks,
Harald.
 
Reply With Quote
 
 
 
 
VisionSet
Guest
Posts: n/a
 
      08-15-2003
"Harald Kirsch" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
>
> In order to test whether the key is already known to the Hashtable
> I currently create a String object from the StringBuffer. But
> most of the time the key happens to be already known to the
> Hashtable and the String created is immediately thrown away again.


Well if most occasions the Hashtable already holds it then don't worry, you
never made a new String to check it with. Strings are interned by default,
that is only one copy of an identical string is made, even when you do new
String("xyz"), if xyz already exists in the JVM, your reference just points
to that.

--
Mike W


 
Reply With Quote
 
 
 
 
Marco Schmidt
Guest
Posts: n/a
 
      08-15-2003
Harald Kirsch:

[...]

>I know that it will be a disaster if I change a stored object later,
>but maybe someone knows a clean way to get around the superflous
>creation of a String object.


I don't know about a perfect solution, but if you do create the String
and replace it with what intern() returns on that new String, you
avoid having a String with the same content twice:

String newString = ...;
newString = newString.intern();

But I'd do some measurements if you really have a bottleneck. There
may be no need for optimization.

Regards,
Marco
--
Please reply in the newsgroup, not by email!
Java programming tips: http://jiu.sourceforge.net/javatips.html
Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
 
Reply With Quote
 
John C. Bollinger
Guest
Posts: n/a
 
      08-15-2003
VisionSet wrote:
> "Harald Kirsch" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed) om...
>
>>In order to test whether the key is already known to the Hashtable
>>I currently create a String object from the StringBuffer. But
>>most of the time the key happens to be already known to the
>>Hashtable and the String created is immediately thrown away again.

>
>
> Well if most occasions the Hashtable already holds it then don't worry, you
> never made a new String to check it with. Strings are interned by default,
> that is only one copy of an identical string is made, even when you do new
> String("xyz"), if xyz already exists in the JVM, your reference just points
> to that.


I'm sorry, but that's correct only for compile-time computed Strings.
Strings created at runtime may indeed be equal to but distinct from
other Strings in the same VM. It is always possible to get a reference
to a globally shared String representing the same sequence of characters
(via the intern() method) but you still end up creating and then
discarding a distinct String object. See the JLS discussion at

http://java.sun.com/docs/books/jls/s...doc.html#19369


John Bollinger
http://www.velocityreviews.com/forums/(E-Mail Removed)

 
Reply With Quote
 
John C. Bollinger
Guest
Posts: n/a
 
      08-15-2003
Harald Kirsch wrote:
> My applications does a lot of text crunching, filtering
> information out of the text. For reasons of efficiency
> all the text handling is done in StringBuffer objects.
>
> When the application finds something interesting, it wants
> to store it in a Hashtable with the text to be used as a
> key still in the StringBuffer.
>
> In order to test whether the key is already known to the Hashtable
> I currently create a String object from the StringBuffer. But
> most of the time the key happens to be already known to the
> Hashtable and the String created is immediately thrown away again.
>
> How can I use a reusable object, e.g. a StringBuffer, as a key to
> the Hashtable such that hashing is performed according .equals()
> instead of according to object identity.


You have a misconception. Hashtables use Objects' (keys') hashCode
method to perform hashing and their equals() methods to compare keys
that are in the same hash bucket. The Map interface explains the
applicable contract in some detail.

It is possible that you are confused because StringBuffers inherit their
equals() and hashCode() methods from Object, which implements equals()
in terms of object identity.

> I know that it will be a disaster if I change a stored object later,
> but maybe someone knows a clean way to get around the superflous
> creation of a String object.


Changing an object that is in use as a Hashtable key in a way that
affects its hashCode or equals() method results in unspecified behavior
of the Hashtable. You don't want that.

There is no solution that provides the behavior you want within the
constraints you have specified (source key data from a StringBuffer, key
object to implement equals differently than StringBuffer.equals() does,
and new object creation impermissable). The only way to reduce the
number of transient object instantiations is to rework your algorithm so
that you get fewer duplicate hits in the first place.


John Bollinger
(E-Mail Removed)

 
Reply With Quote
 
Adam Maass
Guest
Posts: n/a
 
      08-15-2003

"Harald Kirsch" <(E-Mail Removed)> wrote:
> My applications does a lot of text crunching, filtering
> information out of the text. For reasons of efficiency
> all the text handling is done in StringBuffer objects.
>
> When the application finds something interesting, it wants
> to store it in a Hashtable with the text to be used as a
> key still in the StringBuffer.
>
> In order to test whether the key is already known to the Hashtable
> I currently create a String object from the StringBuffer. But
> most of the time the key happens to be already known to the
> Hashtable and the String created is immediately thrown away again.
>
> How can I use a reusable object, e.g. a StringBuffer, as a key to
> the Hashtable such that hashing is performed according .equals()
> instead of according to object identity.
>
> I know that it will be a disaster if I change a stored object later,
> but maybe someone knows a clean way to get around the superflous
> creation of a String object.
>


First off, creating a String object from a StringBuffer is relatively
inexpensive... the String and the StringBuffer initially share internal
storage. Only when the StringBuffer is susequently modified does the
StringBuffer class make a copy of the internal storage.


Alternatively, you need to write your own class that has the mutability of
StringBuffer but the equals and hashCode semantics of String. Not hard,
really.


-- Adam Maass


 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      08-16-2003
Adam Maass wrote:

> Alternatively, you need to write your own class that has the
> mutability of StringBuffer but the equals and hashCode semantics of
> String. Not hard, really.


Or switch the HashTable to a java.util.TreeMap with a custom Comparator.

-- chris


 
Reply With Quote
 
Harald Kirsch
Guest
Posts: n/a
 
      08-16-2003
"John C. Bollinger" <(E-Mail Removed)> wrote:
> Harald Kirsch wrote:
> > My applications does a lot of text crunching, filtering
> > information out of the text. For reasons of efficiency
> > all the text handling is done in StringBuffer objects.
> >
> > When the application finds something interesting, it wants
> > to store it in a Hashtable with the text to be used as a
> > key still in the StringBuffer.
> >
> > In order to test whether the key is already known to the Hashtable
> > I currently create a String object from the StringBuffer. But
> > most of the time the key happens to be already known to the
> > Hashtable and the String created is immediately thrown away again.
> >
> > How can I use a reusable object, e.g. a StringBuffer, as a key to
> > the Hashtable such that hashing is performed according .equals()
> > instead of according to object identity.

>
> You have a misconception. Hashtables use Objects' (keys') hashCode
> method to perform hashing and their equals() methods to compare keys
> that are in the same hash bucket. The Map interface explains the
> applicable contract in some detail.
>
> It is possible that you are confused because StringBuffers inherit their
> equals() and hashCode() methods from Object, which implements equals()
> in terms of object identity.


No, I was not confused, but my question was probably not clear
clear enough about this point. That it cannot work with the available
implementation of .hashcode() and .equals() is obvious. I hoped someone
came up with an ingenious idea of how to borrow String.hashcode() and and
String.equals() and apply it to StringBuffer without rewriting everything.

> There is no solution that provides the behavior you want within the
> constraints you have specified (source key data from a StringBuffer, key
> object to implement equals differently than StringBuffer.equals() does,
> and new object creation impermissable). The only way to reduce the
> number of transient object instantiations is to rework your algorithm so
> that you get fewer duplicate hits in the first place.


Unfortunately the keys stem directly from real world data. The task is
simple: store a key only once, even if seen many times in the
input. I don't see how this could be reworked to not produce the
to-be-checked keys from the data.

Thanks anyway,
Harald.
 
Reply With Quote
 
Harald Kirsch
Guest
Posts: n/a
 
      08-16-2003
"Adam Maass" <(E-Mail Removed)> wrote:
> "Harald Kirsch" <(E-Mail Removed)> wrote:

[snip
> > How can I use a reusable object, e.g. a StringBuffer, as a key to
> > the Hashtable such that hashing is performed according .equals()
> > instead of according to object identity.
> >
> > I know that it will be a disaster if I change a stored object later,
> > but maybe someone knows a clean way to get around the superflous
> > creation of a String object.
> >

>
> First off, creating a String object from a StringBuffer is relatively
> inexpensive... the String and the StringBuffer initially share internal
> storage. Only when the StringBuffer is susequently modified does the
> StringBuffer class make a copy of the internal storage.


The whole point of using a StringBuffer is to change it all the time.
In fact in my application it is totally counterproductive that
StringBuffer.toString() tries create a String object which shares the
char[] because an immediate, inevitable, subsequent change of the StringBuffer
then reallocates the StringBuffers char[]. This is bad for two reasons:

1) After the StringBuffer, during several .append, grew large enough to
hold all the typical pieces of text I deal with, the reallocation most
probably makes it too short, triggering even more reallocations in subsequent
..appends.

2) The String object created keeps the char[] which is most probably far
to large.

I guess this unhelpful implementation only serves the compiler well to
implement things like "a"+"b"+345+"some other text" fairly efficient.

Harald.
 
Reply With Quote
 
Adam Maass
Guest
Posts: n/a
 
      08-16-2003

"Harald Kirsch" <(E-Mail Removed)> wrote:
> "Adam Maass" <(E-Mail Removed)> wrote:
> > "Harald Kirsch" <(E-Mail Removed)> wrote:

> [snip
> > > How can I use a reusable object, e.g. a StringBuffer, as a key to
> > > the Hashtable such that hashing is performed according .equals()
> > > instead of according to object identity.
> > >
> > > I know that it will be a disaster if I change a stored object later,
> > > but maybe someone knows a clean way to get around the superflous
> > > creation of a String object.
> > >

> >
> > First off, creating a String object from a StringBuffer is relatively
> > inexpensive... the String and the StringBuffer initially share internal
> > storage. Only when the StringBuffer is susequently modified does the
> > StringBuffer class make a copy of the internal storage.

>
> The whole point of using a StringBuffer is to change it all the time.
> In fact in my application it is totally counterproductive that
> StringBuffer.toString() tries create a String object which shares the
> char[] because an immediate, inevitable, subsequent change of the

StringBuffer
> then reallocates the StringBuffers char[]. This is bad for two reasons:
>
> 1) After the StringBuffer, during several .append, grew large enough to
> hold all the typical pieces of text I deal with, the reallocation most
> probably makes it too short, triggering even more reallocations in

subsequent
> .appends.
>
> 2) The String object created keeps the char[] which is most probably far
> to large.
>
> I guess this unhelpful implementation only serves the compiler well to
> implement things like "a"+"b"+345+"some other text" fairly efficient.
>


In which case, you need a custom class with the mutability of StringBuffer
but the .equals and .hashCode semantics of String. This is not hard to do
since the source of these standard classes is available.

-- Adam Maass


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: I need something like a "Hashtable<String[], Integer>" kind ofdata type ... Tom Anderson Java 1 11-28-2010 10:39 PM
Re: hashtable or map? (map inserts not behaving as I expect - and I cant find a decent simple example for hashtable) Kai-Uwe Bux C++ 1 12-21-2008 09:25 PM
Hashtable filled with own key just has the same keys in enumeration. feju2000@gmx.de Java 8 06-19-2006 07:08 PM
how to sort the keys from Hashtable into Alphabet order when use for output display u126561@yahoo.com.au Java 8 05-08-2006 08:29 AM
Hashtable/Two keys Alexander Mueller Java 29 03-22-2005 10:57 AM



Advertisments