Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Search byte[] for pattern

Reply
Thread Tools

Search byte[] for pattern

 
 
Ian Pilcher
Guest
Posts: n/a
 
      12-16-2003
I'm looking for an API that will search an array of bytes for a
specified byte pattern -- basically the same thing as
String.indexOf(String).

Does such a method exist?

Thanks!
--
================================================== ======================
Ian Pilcher
================================================== ======================

 
Reply With Quote
 
 
 
 
Chris Smith
Guest
Posts: n/a
 
      12-16-2003
Ian Pilcher wrote:
> I'm looking for an API that will search an array of bytes for a
> specified byte pattern -- basically the same thing as
> String.indexOf(String).
>
> Does such a method exist?


Sure, there's one right here, for example:

> http://www.fmi.uni-sofia.bg/fmi/logi...atch_java.html


--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
Chris Smith
Guest
Posts: n/a
 
      12-16-2003
Ian Pilcher wrote:
> I'm looking for an API that will search an array of bytes for a
> specified byte pattern -- basically the same thing as
> String.indexOf(String).
>
> Does such a method exist?


Oops, I meant to convert the code before sending to use byte[] instead
of String. Here's a conversion based on the code from the page I
mentioned earlier (and some API changes to make it more like indexOf):

/**
* Knuth-Morris-Pratt Algorithm for Pattern Matching
*/
public class KMPMatch {
/**
* Finds the first occurrence of the pattern in the text.
*/
public boolean indexOf(byte[] data, byte[] pattern) {
int[] failure = computeFailure(pattern);

int j = 0;
if (data.length == 0) return false;

for (int i = 0; i < data.length; i++) {
while (j > 0 && pattern[j] != data[i]) {
j = failure[j - 1];
}
if (pattern[j] == data[i]) { j++; }
if (j == pattern.length) {
return i - pattern.length + 1;
}
}
return -1;
}

/**
* Computes the failure function using a boot-strapping process,
* where the pattern is matched against itself.
*/
private int[] computeFailure(byte[] pattern) {
int[] failure = new int[pattern.length];

int j = 0;
for (int i = 1; i < pattern.length; i++) {
while (j > 0 && pattern[j] != pattern[i]) {
j = failure[j - 1];
}
if (pattern[j] == pattern[i]) {
j++;
}
failure[i] = j;
}

return failure;
}
}


--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
Ian Pilcher
Guest
Posts: n/a
 
      12-16-2003
Chris Smith wrote:
>
> Sure, there's one right here, for example:
>
>>http://www.fmi.uni-sofia.bg/fmi/logi...atch_java.html

>


I take it from your response that the Java platform API does not provide
such a method. <sigh>

Thanks!

--
================================================== ======================
Ian Pilcher
================================================== ======================

 
Reply With Quote
 
Gregory A. Swarthout
Guest
Posts: n/a
 
      12-16-2003
Chris Smith <> wrote in message news:<>...
> Ian Pilcher wrote:
> > I'm looking for an API that will search an array of bytes for a
> > specified byte pattern -- basically the same thing as
> > String.indexOf(String).


If it's really basically the same thing, why not:

int index = new String(byteArray1).indexOf(new String(byteArray2));

Greg
 
Reply With Quote
 
Chris Smith
Guest
Posts: n/a
 
      12-17-2003
> > Ian Pilcher wrote:
> > > I'm looking for an API that will search an array of bytes for a
> > > specified byte pattern -- basically the same thing as
> > > String.indexOf(String).


Gregory A. Swarthout wrote:
> If it's really basically the same thing, why not:
>
> int index = new String(byteArray1).indexOf(new String(byteArray2));


Well, because that code does something very different than what you're
imagining it to do. The code there takes byteArray1, converts it to a
text String using some platform-dependent default character encoding,
then does the same thing with byteArray2, and then determines the index
of one String within the other.

The result is going to vary widely depending on the platform it's run
on, and is certainly not going to reliably be the same as what the OP
clearly wanted -- to find the location of occurrence of one byte-
sequence within a larger byte-sequence.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
Gregory A. Swarthout
Guest
Posts: n/a
 
      12-17-2003
Chris Smith <> wrote in message news:<>...
> > > Ian Pilcher wrote:
> > > > I'm looking for an API that will search an array of bytes for a
> > > > specified byte pattern -- basically the same thing as
> > > > String.indexOf(String).

>
> Gregory A. Swarthout wrote:
> > If it's really basically the same thing, why not:
> >
> > int index = new String(byteArray1).indexOf(new String(byteArray2));

>
> Well, because that code does something very different than what you're
> imagining it to do. The code there takes byteArray1, converts it to a
> text String using some platform-dependent default character encoding,
> then does the same thing with byteArray2, and then determines the index
> of one String within the other.
>
> The result is going to vary widely depending on the platform it's run
> on, and is certainly not going to reliably be the same as what the OP
> clearly wanted -- to find the location of occurrence of one byte-
> sequence within a larger byte-sequence.


You're right, but using String(byteArray1, charsetName) for your
String definitions would remove platform-dependence and could very
well be analagous to what the OP was after.

Greg
 
Reply With Quote
 
Chris Smith
Guest
Posts: n/a
 
      12-17-2003
Gregory A. Swarthout wrote:
> You're right, but using String(byteArray1, charsetName) for your
> String definitions would remove platform-dependence and could very
> well be analagous to what the OP was after.


Could be. The requirements would be that the charset:

1. Is a straight 8-bit character encoding.
2. Contains a unique character mapping for each possible byte value.

A violation of the first rule would skew the indexOf result, and would
also create the danger that the beginning of a match in the byte data
would be missed because it starts within a character in the converted
form. A violation of the second rule could create false positives in
matching, if two different bytes are mapped to the same character (or
are both unmapped, in which case String would map them to '?').

I am not familiar enough with the range of available character encodings
to know if there's one that fits that description. Do you know if there
is such a character encoding?

In any case, it would require comments to point out the kludgy solution,
as I would presume on seeing such a thing that the author was confused,
and try to fix it.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
Steve Horsley
Guest
Posts: n/a
 
      12-17-2003
Chris Smith wrote:
> Gregory A. Swarthout wrote:
>
>>You're right, but using String(byteArray1, charsetName) for your
>>String definitions would remove platform-dependence and could very
>>well be analagous to what the OP was after.

>
>
> Could be. The requirements would be that the charset:
>
> 1. Is a straight 8-bit character encoding.
> 2. Contains a unique character mapping for each possible byte value.
>
> A violation of the first rule would skew the indexOf result, and would
> also create the danger that the beginning of a match in the byte data
> would be missed because it starts within a character in the converted
> form. A violation of the second rule could create false positives in
> matching, if two different bytes are mapped to the same character (or
> are both unmapped, in which case String would map them to '?').
>
> I am not familiar enough with the range of available character encodings
> to know if there's one that fits that description. Do you know if there
> is such a character encoding?
>
> In any case, it would require comments to point out the kludgy solution,
> as I would presume on seeing such a thing that the author was confused,
> and try to fix it.
>

Are you playing Devil's Advocate, or do you know something that I
don't? I thought that "ISO8859_1" always gave direct 1-1 mapping
between bytes(0-255) and char values.
Am I missing a gotcha? Control char values maybe?

"new String(byte[], 0)" should do the trick too, though it is deprecated.

I agree, converting bytes to a String just to pattern match is
'orrible. Better (and probably lots faster) to rustle up your own
search method. You'd think the Byte class would have a utility for
that, wouldn't you.

Steve
 
Reply With Quote
 
Chris Smith
Guest
Posts: n/a
 
      12-17-2003
Steve Horsley wrote:
> Are you playing Devil's Advocate, or do you know something that I
> don't? I thought that "ISO8859_1" always gave direct 1-1 mapping
> between bytes(0-255) and char values.
> Am I missing a gotcha? Control char values maybe?


IIRC, ISO 8859-1 does not define character mappings for byte values in
the range of 128-160 or so. As such, the technique suggested would
produce a false positive for patterns that differed only in the choice
of value from within that range.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
search pattern Li Chen Ruby 7 01-11-2008 06:59 PM
pattern search Diez B. Roggisch Python 9 03-29-2007 06:03 PM
Search Pattern Eric Java 1 08-09-2006 08:35 AM
Search for byte pattern in a binary file. Ryan Tan via JavaKB.com Java 20 11-19-2004 07:41 AM
Pattern Search...? cybit friendly C++ 6 02-06-2004 12:00 AM



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57