Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > String containing algorithm

Reply
Thread Tools

String containing algorithm

 
 
JS
Guest
Posts: n/a
 
      11-25-2005
Hi all,
Can anyone help me with a small problem I'm having. I need to write a method
which takes two Strings of any lengths and returns any letters which are
common between the two. The Strings will be something like ABECA and they
could be of any length.

For example:
ABECA and ECDRE would return EC, I dont mind about the order which they come
in because I am processing them further anyway.

I just cant get any algorithms to work on it, not even brute force.
Any help is appreciated, thanks in advance.
JS


 
Reply With Quote
 
 
 
 
zero
Guest
Posts: n/a
 
      11-25-2005
"JS" <(E-Mail Removed)> wrote in
news:TSIhf.4040$(E-Mail Removed):

> Hi all,
> Can anyone help me with a small problem I'm having. I need to write a
> method which takes two Strings of any lengths and returns any letters
> which are common between the two. The Strings will be something like
> ABECA and they could be of any length.
>
> For example:
> ABECA and ECDRE would return EC, I dont mind about the order which
> they come in because I am processing them further anyway.
>
> I just cant get any algorithms to work on it, not even brute force.
> Any help is appreciated, thanks in advance.
> JS
>
>
>


This sounds suspiciously like a homework assignment.

Here's a possible algorithm (no code, I'm sure you can handle that
yourself)

prerequisites: Strings A and B of arbitrary length; empty string (or
stringbuffer) C.

1. take the first character in string A, and compare it to every
character in B
1a. alternative to 1: take the first character in string A, turn it into
a CharSequence, and use String method contains()
2. if you have a match, add it to a C
3. take the next character in A, and we're back at 1.

If you want a case-insensitive algorithm just add toLowerCase or
toUpperCase.

If you want to weed out duplicates, check if the current character is
already contained in C.

--
Beware the False Authority Syndrome
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      11-25-2005
On Fri, 25 Nov 2005 18:38:43 GMT, "JS" <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>For example:
>ABECA and ECDRE would return EC, I dont mind about the order which they come
>in because I am processing them further anyway.


I would do it with two java.util.BitSets
each 64K bits long (possibly shorter if you can guarantee a narrower
char range.). Go through String a turning on bits corresponding to
chars in bita. (index bit by char number). Then repeat with b and
bitb.

Then compute the logical AND of bita and bitb.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-25-2005
On Fri, 25 Nov 2005 18:38:43 GMT, "JS" <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>For example:
>ABECA and ECDRE would return EC, I dont mind about the order which they come
>in because I am processing them further anyway.


another approach is to create a HashSet of the chars in string a. Then
create another HashSet for String b. Then enumerate HashSet b and
remove els that don't exist in set a.

Another approach to is use a nested loop

pseudocode:

for each char in string a
for each char in string b

if achar == bchar add to HashSet if not already there.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
 
Reply With Quote
 
Alan Krueger
Guest
Posts: n/a
 
      11-25-2005
JS wrote:
> Hi all,
> Can anyone help me with a small problem I'm having. I need to write a method
> which takes two Strings of any lengths and returns any letters which are
> common between the two. The Strings will be something like ABECA and they
> could be of any length.
>
> For example:
> ABECA and ECDRE would return EC, I dont mind about the order which they come
> in because I am processing them further anyway.


Please clarify. Your example fits your text, but it also matches a
"longest common substring" search. The answers you've received so far
appear to address both of these.

For instance, what happens if you use "ABCDEF" and "FEDCBA"? If it's
all the letters in common in both strings, you'll get "ABCDEF" in some
permutation. If it's a longest common substring, you'll get one of the
letters.

If you're truly just searching for all letters in common between the two
strings, you might want to use a HashSet<Character> for each string, add
each character in each string to its respective HashSet, and obtain
the set of common characters by intersecting the sets.
 
Reply With Quote
 
Alan Krueger
Guest
Posts: n/a
 
      11-25-2005
Roedy Green wrote:
> On Fri, 25 Nov 2005 18:38:43 GMT, "JS" <(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>>For example:
>>ABECA and ECDRE would return EC, I dont mind about the order which they come
>>in because I am processing them further anyway.

>
> another approach is to create a HashSet of the chars in string a. Then
> create another HashSet for String b. Then enumerate HashSet b and
> remove els that don't exist in set a.


Set.retainAll already does that last bit.

> Another approach to is use a nested loop
>
> pseudocode:
>
> for each char in string a
> for each char in string b
> if achar == bchar add to HashSet if not already there.


That's unnecessarily O(n^2) - not horrible for a quick-and-dirty, but to
be avoided in production and homework assignments. Sorting them and
then linearly merging them would be O(n log n), and the HashSet approach
is O(n) when the hash table is large enough and the hash function is
well-tuned to the data.
 
Reply With Quote
 
Alan Krueger
Guest
Posts: n/a
 
      11-25-2005
Roedy Green wrote:
> On Fri, 25 Nov 2005 18:38:43 GMT, "JS" <(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>>For example:
>>ABECA and ECDRE would return EC, I dont mind about the order which they come
>>in because I am processing them further anyway.

>
> I would do it with two java.util.BitSets
> each 64K bits long (possibly shorter if you can guarantee a narrower
> char range.). Go through String a turning on bits corresponding to
> chars in bita. (index bit by char number). Then repeat with b and
> bitb.


A sparse set implementation would be better; if you have string lengths
far below 64K, it's not going to use very much of that space.

Plus, while Java represents characters internally in UTF-16, there are
far more than 2^16 code points possible.
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      11-25-2005
On Fri, 25 Nov 2005 16:16:12 -0600, Alan Krueger
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>Plus, while Java represents characters internally in UTF-16, there are
>far more than 2^16 code points possible.


char is 16 bits so there can't possibly be more than 2^16 = 64K
combinations. With a java.util.BitSet you can track presence with 8K
bytes worth of bits (stored as longs in a BitSet).

most of the time you have an upper bound on your unicode chars
considerably lower than 64K.

The advantage of BitSet is the speed of direct addressing. Any sparce
scheme will have a lot of lookup overhead.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.
 
Reply With Quote
 
Alan Krueger
Guest
Posts: n/a
 
      11-26-2005
Roedy Green wrote:
> On Fri, 25 Nov 2005 16:16:12 -0600, Alan Krueger
> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
> who said :
>
>>Plus, while Java represents characters internally in UTF-16, there are
>>far more than 2^16 code points possible.

>
> char is 16 bits so there can't possibly be more than 2^16 = 64K
> combinations.


There cannot be more than 2^16 char values, but there are far more than
2^16 Unicode code points, depending on your version of Unicode.

http://java.sun.com/developer/techni...Supplementary/

"Supplementary characters are characters in the Unicode
standard whose code points are above U+FFFF, and which
therefore cannot be described as single 16-bit entities
such as the char data type in the Java programming
language. [...]

"These are now interpreted as UTF-16 sequences, and the
implementations of these APIs is changed to correctly
handle supplementary characters. The enhancements are
part of version 5.0 of the Java 2 Platform, Standard
Edition (J2SE) [...]

"The Unicode standard therefore has been extended to allow
up to 1,112,064 characters."

Because it's UTF-16, these characters are serialized into 16-bit
character sequences, which means that not every character represents a
single Unicode code point, sometimes multiple characters are used.

> most of the time you have an upper bound on your unicode chars
> considerably lower than 64K.
>
> The advantage of BitSet is the speed of direct addressing. Any sparce
> scheme will have a lot of lookup overhead.


No, a properly-sized HashSet will have CONSTANT TIME lookup, just like a
lookup table.
 
Reply With Quote
 
JS
Guest
Posts: n/a
 
      11-26-2005
Thanks everyone for your help. The easiest one sounds like zeros idea which
should work a treat. I'm not worried about the time of the algorithm because
it isnt for submission or publication anywhere, and for those of you still
left wondering, sorry about my explaination, i meant any letters in any
order regardless of where they are.
Thanks again
JS
"Alan Krueger" <(E-Mail Removed)> wrote in message
news:nLLhf.764$(E-Mail Removed)...
> JS wrote:
> > Hi all,
> > Can anyone help me with a small problem I'm having. I need to write a

method
> > which takes two Strings of any lengths and returns any letters which are
> > common between the two. The Strings will be something like ABECA and

they
> > could be of any length.
> >
> > For example:
> > ABECA and ECDRE would return EC, I dont mind about the order which they

come
> > in because I am processing them further anyway.

>
> Please clarify. Your example fits your text, but it also matches a
> "longest common substring" search. The answers you've received so far
> appear to address both of these.
>
> For instance, what happens if you use "ABCDEF" and "FEDCBA"? If it's
> all the letters in common in both strings, you'll get "ABCDEF" in some
> permutation. If it's a longest common substring, you'll get one of the
> letters.
>
> If you're truly just searching for all letters in common between the two
> strings, you might want to use a HashSet<Character> for each string, add
> each character in each string to its respective HashSet, and obtain
> the set of common characters by intersecting the sets.



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Search String and Delete Line Containing String Bob Hatch Ruby 3 02-01-2011 08:59 PM
Filtered Back Projection Algorithm (FBP Algorithm) Bapaiah Katepalli VHDL 1 06-23-2006 04:50 PM
Big O and algorithm to decide string A contains string B? usgog@yahoo.com C++ 15 04-10-2005 03:03 AM
Replacing palindrome substrings of an input string with a given string. Any effecient algorithm? Tung Chau C Programming 0 08-06-2004 10:18 AM
Key generation algorithm and Cipher algorithm Ahmed Moustafa Java 0 11-15-2003 06:35 AM



Advertisments