Hex Search Utility

Discussion in 'Computer Information' started by no@spam.com, Jun 7, 2009.

  1. Guest

    Here is the problem. I have a list of 100 16-byte hex strings. I want to
    search a file and return the offset to each of these values. The file is
    close to 1TB in size and takes 8 hours to search through with a normal
    hex editor. Therefore, I can't just search 100 different times. I have
    to search once and return all locations. WinHex has a function called
    Simultaneous Search that is exactly what I want. But the list is only
    big enough to hold 7 hex strings. So I would still have to search the
    file 15 times, which just won't do.

    So my question is, does anyone know of a utility that will search a file
    using a list of hex values to find?
     
    , Jun 7, 2009
    #1
    1. Advertising

  2. sandy58 Guest

    On Jun 8, 12:41 am, Robert Baer <> wrote:
    > wrote:
    > > Here is the problem. I have a list of 100 16-byte hex strings. I want to
    > > search a file and return the offset to each of these values. The file is
    > > close to 1TB in size and takes 8 hours to search through with a normal
    > > hex editor. Therefore, I can't just search 100 different times. I have
    > > to search once and return all locations. WinHex has a function called
    > > Simultaneous Search that is exactly what I want. But the list is only
    > > big enough to hold 7 hex strings. So I would still have to search the
    > > file 15 times, which just won't do.

    >
    > > So my question is, does anyone know of a utility that will search a file
    > > using a list of hex values to find?

    >
    >    Write your own, using your favorite language (COBOL, FORTRAN, BASIC,
    > ALGOL, etc)
    >    Make a small array to hold those 100 strings, allocate a buffer for
    > the file as large as RAM will tolerate with some space to spare for
    > Mister Justin Case (Murphy's brother); make the size an integer multiple
    > of 16 bytes (say 2^20 bytes).
    >    Open the file and read a block (directly into the buffer) and do your
    > matching search; if you know that the offsets are in multiples of 16
    > bytes, then step thru the buffer in 16-byte blocks for matching to the
    > array. on a match, calculate the absolute offset via file read offset +
    > match offset minus one; place that value in a second 100 element array
    > that was previously set to zero.
    >    So no further matching is needed for element X if corresponding
    > element in second array is filled.
    > **
    >    How the heck did you get a terrorbyte file onto the drive in the
    > first place?


    He's fulla shit. :-(
     
    sandy58, Jun 8, 2009
    #2
    1. Advertising

  3. Paul Guest

    wrote:
    > Here is the problem. I have a list of 100 16-byte hex strings. I want to
    > search a file and return the offset to each of these values. The file is
    > close to 1TB in size and takes 8 hours to search through with a normal
    > hex editor. Therefore, I can't just search 100 different times. I have
    > to search once and return all locations. WinHex has a function called
    > Simultaneous Search that is exactly what I want. But the list is only
    > big enough to hold 7 hex strings. So I would still have to search the
    > file 15 times, which just won't do.
    >
    > So my question is, does anyone know of a utility that will search a file
    > using a list of hex values to find?


    If the requirements are specialized, sometimes you have to write your own.

    I'm not a C programmer, but here is my attempt at a utility. This utility
    has been stripped of its interface, so in practice, you keep the compiler
    handy, make changes, recompile, and then run it. The compiler is really
    fast, and only takes a second. There is no attempt at portability
    in the code, so it might only work right on certain processors or
    environments. The program runs in a command window. You can redirect
    the output to a file, like this

    filt.exe > output.txt

    When there is no checking code in the main loop, the program can read at
    60MB/sec. If the program checks for 5 different 16 byte sequences,
    it slows to ~30MB/sec. If the program checks for 100 different
    16 byte sequences, it only manages ~3MB/sec. So bumping up the
    number of search entries, hurts the performance. You'll not be
    gaining much over your WinHex tool.

    The main reason for offering source, is chances are, you want to
    refine your requirements. You can chop this up, do stuff to it,
    and the compiler provides feedback in a very short time.

    To debug the program, I used extensive "printf" statements. Those
    have been removed, to make the code easier to read. With the
    printf's, I was able to spot the problem with endian-ness and
    reading from the file.

    The program was tested with a 5GB file, so I'm hoping it'll
    make it to a Terabyte without breaking.

    ****************************** filt.c *************************************

    #include <stdio.h>
    #include <fcntl.h>
    #include <time.h>

    /* Compile using "MinGW" for Windows.

    MinGW installer. Run this 137KB program, to install the MinGW environment.
    http://sourceforge.net/project/showfiles.php?group_id=2435&package_id=240780

    Add C:\MinGW\bin to the PATH environment variable (System:Advanced control panel).
    Start a MSDOS (Command prompt) window. This is how you compile it.

    mingw32-gcc -Wall -o filt filt.c

    Executable is filt.exe, produced by the compiler.

    Source file is filt.c (this file). There are currently three warnings
    issued by the compiler. Tested on a Core2 Duo. */

    /* Place 16 byte search patterns in array "match" below
    Format is hexadecimal. Adjust "rows" value according to table size
    5 entry table searches at 30MB/sec
    100 entry table searches at 3MB/sec (snore...) */

    /* You can shorten the table below, and change the value of "rows".
    The third row is the one I used to test for a match. The other
    entries were bogus, to give the program lots of comparing to do.
    Normally, each line in the match table, would be different from
    the rest. Only use as many rows as are needed, so it'll go faster. */

    int main()
    {
    #define readbuffersize 1024
    #define rows 100
    unsigned int match[rows][4]={
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0x01020304, 0x05060708, 0x090A0B0C, 0x0D0E0F10 },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },

    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF },
    { 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF }
    };

    unsigned int sliding[4] = {0,0,0,0}; /* 16 byte buffer, oriented as ints for comparison */
    char * ptrsliding = &sliding; /* Slide a byte at a time, check against all match entries*/

    char print[rows][40]; /* human readable version of "match" entries */

    FILE *fp;
    char *name = "H:/five.bin"; /* My 5GB test file, to prove it is large file ready */
    /* change the filename to point to your 1TB file to be searched */
    /* use forward slashes, as back slashes fail */

    int i; /* process i elements in read buffer array "arr" */
    int j; /* process j rows from match against i-th address*/
    int count; /* count of elements in latest read buffer op */
    int start; /* start position in read array "arr" to start looking */
    char *arr = (char *) malloc(readbuffersize); /* My read buffer. Larger may not help */
    /* I really wanted multithreaded+readahead */
    long long tcount = 0ll; /* should be total byte count minus 15, big enough for 1TB */
    long long starttime = clock(); /* time stamp in milliseconds */

    /* Byte swap within the match integer array, was necessary to solve
    a small problem with memcopy/memmove and reading from the file.
    Since the data comes from the file all screwed up, screw up
    "match" entries in the same way.*/

    for (j=0; j<rows; j++) { /* byte swap all rows */
    sprintf( print[j], "%08X %08X %08X %08X", /* generates strings ready for */
    match[j][0],match[j][1],match[j][2],match[j][3]); /* printing if there is a match */
    for (i=0; i<4; i++) { /* byte swap each 32 bit integer */
    match[j] = (match[j] & 0xFF000000) >> 24 |
    (match[j] & 0x00FF0000) >> 8 |
    (match[j] & 0x0000FF00) << 8 |
    (match[j] & 0x000000FF) << 24;
    }
    }

    printf("\nhello world\n\n"); /* an amateur C programmer tradition */

    if ( (fp = fopen(name,"rb")) == NULL ) {
    printf("Oops! Cannot open file %s\n", name);
    return(1); /* there is very little error checking in this program */
    }

    count=fread(arr, 1, readbuffersize, fp);
    if (count < 16) return(2); /* file too short to compare, minimum 16 bytes */

    memmove(ptrsliding+1,arr,15); /* partially load the sliding window */
    start=15; /* ready to mesh with the while() loop */

    while (count > 0 ) {
    for (i=start; i<count; i++) {
    memmove( ptrsliding, ptrsliding+1, 15); /* shift the window left one place */
    memmove( ptrsliding+15, arr+i, 1); /* add an entry onto the end of the window */
    for (j=0; j<rows; j++) {
    if ( sliding[0] == match[j][0] &&
    sliding[1] == match[j][1] &&
    sliding[2] == match[j][2] &&
    sliding[3] == match[j][3] ) {
    printf("Match at address 0x%013I64X to row %d (%s)\n", tcount, j+1, print[j]);
    /* disk address is 13 digit hex with leading zeros */
    /* Hopping out of the loop, on the first hit would make sense
    but I couldn't be bothered... */
    }
    }
    tcount++;
    }
    if (count == readbuffersize) {
    count=fread(arr, 1, readbuffersize, fp);
    } else count = 0; /* don't read, if last fread gave a partial buffer */
    start=0; /* start was needed, to handle filling the sliding window */
    }

    printf("%I64d bytes processed in %f seconds\n",
    tcount+15ll, ( (float)clock()-starttime )/CLOCKS_PER_SEC);

    rewind(fp); /* can't seem to close a file if past the end ? rewind seemed to work */
    fclose(fp);
    return 0;
    }

    *************************** end of filt.c *********************************

    Paul
     
    Paul, Jun 9, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. -TC-

    Hex and ASCII Keys

    -TC-, Jun 20, 2004, in forum: Wireless Networking
    Replies:
    0
    Views:
    10,057
  2. =?Utf-8?B?U2Vhbg==?=

    XP wireless WEP transmitted in passphrase or Hex key

    =?Utf-8?B?U2Vhbg==?=, Jul 6, 2005, in forum: Wireless Networking
    Replies:
    2
    Views:
    8,164
    Jerry Peterson[MSFT]
    Jul 6, 2005
  3. Nocturnal
    Replies:
    45
    Views:
    2,318
  4. larry

    looking for hex code for color brown

    larry, Oct 4, 2004, in forum: Computer Support
    Replies:
    16
    Views:
    4,601
    Carlos
    Oct 20, 2004
  5. ron

    hex editor

    ron, Nov 11, 2004, in forum: Computer Support
    Replies:
    3
    Views:
    541
    °Mike°
    Nov 12, 2004
Loading...

Share This Page