Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > finding a tag in a binary file

Reply
Thread Tools

finding a tag in a binary file

 
 
rob stanton
Guest
Posts: n/a
 
      02-27-2011
I have a binary file in which I'd like to find multiple strings of 10
00 10 00 (hex) amongst all the other values, then following that is a
name.

I've found that

contents_array.find_all {|e| e== 0x10}
shows all the 0x10 in the file but not the index, there are several
hundred.

contents_array.index(0x10)
shows the first index of 0x10 (242), but how do I go on to list
subsequent indexes of 0x10?

puts(contents_array[242,4])
16
0
85
73
=> nil

shows me that the first 0x10 I find is not correct, i.e. its 10 00 55 49
so I need to go onto the next 0x10 and test again.

I'm a bit stuck now as to how to do that, I'm very new and finding it
difficult to find information...

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Robert Dober
Guest
Posts: n/a
 
      02-27-2011
On Sun, Feb 27, 2011 at 11:50 AM, rob stanton <(E-Mail Removed)> wrote:
> I have a binary file in which I'd like to find multiple strings of =A010
> 00 10 00 (hex) amongst all the other values, then following that is a
> name.


ruby-1.9.2-p136 :024 > content =3D [ 97, 10, 0, 10, 0, 97, 98, 32, 32,
10, 0, 10, 0, 98, 99, 10, 0, 10, 0 ].map(&:chr).join
=3D> "a\n\x00\n\x00ab \n\x00\n\x00bc\n\x00\n\x00"
ruby-1.9.2-p136 :025 >
ruby-1.9.2-p136 :026 > p content.scan(/\n\0\n\0(\w+)/)
[["ab"], ["bc"]]
=3D> [["ab"], ["bc"]]

should do the trick

If you need the index for some other reason that checking for the
name, let us know that would be a little more work .

HTH
Robert
>
> I've found that
>
> contents_array.find_all {|e| e=3D=3D 0x10}
> shows all the 0x10 in the file but not the index, there are several
> hundred.
>
> contents_array.index(0x10)
> shows the first index of 0x10 (242), but how do I go on to list
> subsequent indexes of 0x10?
>
> puts(contents_array[242,4])
> 16
> 0
> 85
> 73
> =3D> nil
>
> shows me that the first 0x10 I find is not correct, i.e. its 10 00 55 49
> so I need to go onto the next 0x10 and test again.
>
> I'm a bit stuck now as to how to do that, I'm very new and finding it
> difficult to find information...
>
> --
> Posted via http://www.ruby-forum.com/.
>
>




--=20
The 1,000,000th fibonacci number contains '42' 2039 times; that is
almost 30 occurrences more than expected (208988 digits).
N.B. The 42nd fibonacci number does not contain '1000000' that is
almost the expected 3.0e-06 times.

 
Reply With Quote
 
 
 
 
rob stanton
Guest
Posts: n/a
 
      02-27-2011
wow a little beyond my just started status... So the array you created
has a coupe of 10 00 10 00 in correct ?
then I don't know what you did with it and you got ab bc ? thanks but
could you explain a little more I'm new to this

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      02-27-2011
On Sun, Feb 27, 2011 at 12:34 PM, rob stanton <(E-Mail Removed)> wrote:
> wow a little beyond my just started status... So the array you created
> has a coupe of 10 00 10 00 in correct ?
> then I don't know what you did with it and you got ab bc ? thanks but
> could you explain a little more I'm new to this
>

right I created a string like "a\n\0\n\0bc..." than I used String#scan
to get all matches of the regular expression matching \n\0\n\0
followed by a non empty sequence of word characters (\w+) which I
grouped.
To demonstrate what that does let us look at this code ( I got rid of
one \n\0 for laziness

content.scan /\n\0\w+/
=> ["\n\x00ab", "\n\x00bc"]

but if we use a group in the regex we get only the group(s) (as a sub-array)

content.scan /\n\0(\w+)/
=> [["ab"], ["bc"]]

So if all you need is to scan the names following \n\0\n\0 you are
done, if you need the
positions in the string it is a little bit more work.



> --
> Posted via http://www.ruby-forum.com/.
>
>




--
The 1,000,000th fibonacci number contains '42' 2039 times; that is
almost 30 occurrences more than expected (208988 digits).
N.B. The 42nd fibonacci number does not contain '1000000' that is
almost the expected 3.0e-06 times.

 
Reply With Quote
 
rob stanton
Guest
Posts: n/a
 
      02-27-2011
Hi Robert, got it now, but the data is all in hex, your code gives the
ascii code ? but its almost there. the name follows 10 00 10 00 in this
format 50 4e (P) (N) then xx 00 "surname" 5e "first name" followed by 10
and then 00
I'll see what I can do with your code but any help would be appreciated

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      02-27-2011
On Sun, Feb 27, 2011 at 2:32 PM, rob stanton <(E-Mail Removed)> wrote:
> Hi Robert, got it now, but the data is all in hex, your code gives the
> ascii code ? but its almost there. the name follows 10 00 10 00 in this
> format 50 4e (P) (N) then xx 00 "surname" 5e "first name" followed by 10
> and then 00
> I'll see what I can do with your code but any help would be appreciated
>

Well if in your encoding letters do not match \w, you will need to
indicate the values with hex values in the regex. This is a little
more work but there should not be any difficulty.

you can match the hex value 4e against /\x4e/
a range of hex values with /[\x20-\x32]/
or in the worst case you enumerate the characters that shall match
with /[\x32,\x36,\x42...]/

assuming that your letters are encoded with the characters 0x40, 0x42
and 0x44 to 0x50
the expression

content.scan( /\n\0\n\0([\x40,\x42,\x44-\x50]+)/ )

would do the trick.

HTH
R.
> --
> Posted via http://www.ruby-forum.com/.
>
>




--
The 1,000,000th fibonacci number contains '42' 2039 times; that is
almost 30 occurrences more than expected (208988 digits).
N.B. The 42nd fibonacci number does not contain '1000000' that is
almost the expected 3.0e-06 times.

 
Reply With Quote
 
rob stanton
Guest
Posts: n/a
 
      02-27-2011
hmm does not work for me, could I send the file I'm working with, well a
reduced file as it very big and see what you make of it, maybe you'll
see what I'm after. It makes sense if you look at it with a hex viewer
and search for 10 00 10 00 thanks for the help so far

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      02-28-2011
sure but by all means let us take this offline
please send the file privately and I will try to make some time to
look at it, but I'll probably not manage before next WE, maybe some
good soul on this list volunteering?

--
The 1,000,000th fibonacci number contains '42' 2039 times; that is
almost 30 occurrences more than expected (208988 digits).
N.B. The 42nd fibonacci number does not contain '1000000' that is
almost the expected 3.0e-06 times.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      03-01-2011
Now I somehow succeeded to help our friend but I have to admit quite
some ignorance with 1.9 encoding issues. In order to parse a binary
file with a regex I needed to encode the regex in ASCII-8BIT the best
I could do was adding a completely unnecessary byte to the regex (0xf2
at the start)

/\xf2?\x10\x00\x10\x00PN.\x00([\w^]+)/

I sure would appreciate if someone could point me to how to do this properly.

Thx in advance

Robert
--
The 1,000,000th fibonacci number contains '42' 2039 times; that is
almost 30 occurrences more than expected (208988 digits).
N.B. The 42nd fibonacci number does not contain '1000000' that is
almost the expected 3.0e-06 times.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      03-06-2011
Eventually I found some time to investigate this. Searching on
ruby-core, redmine and ruby-spec I found no indication whatsoever that
it is possible to specify the encoding explicitly (with the exception
of the u,n and s switches). I would love to have an `encoding:'
parameter in Regexp#new.
or at least a switch for force for ASCII-8BIT.
Any thoughts on that.

Cheers
Robert

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
finding a tag in a binary file rob s. Ruby 5 02-25-2011 01:01 AM
how do u invoke Tag b's Tag Handler from within Tag a's tag Handler? shruds Java 1 01-27-2006 03:00 AM
finding a binary pattern in a file. Shashank Khanvilkar Perl Misc 2 09-20-2005 08:23 PM
finding/replacing a long binary pattern in a .bin file yaipa Python 13 01-19-2005 09:20 PM
Reading binary file finding EOF spideyman99@hotmail.com C Programming 11 12-14-2004 11:23 AM



Advertisments