Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Reading binary data

Reply
Thread Tools

Reading binary data

 
 
Aaron Scott
Guest
Posts: n/a
 
      09-10-2008
I've been trying to tackle this all morning, and so far I've been
completely unsuccessful. I have a binary file that I have the
structure to, and I'd like to read it into Python. It's not a
particularly complicated file. For instance:

signature char[3] "GDE"
version uint32 2
attr_count uint32
{
attr_id uint32
attr_val_len uint32
attr_val char[attr_val_len]
} ... repeated attr_count times ...

However, I can't find a way to bring it into Python. This is my code
-- which I know is definitely wrong, but I had to start somewhere:

import struct
file = open("test.gde", "rb")
output = file.read(3)
print output
version = struct.unpack("I", file.read(4))[0]
print version
attr_count = struct.unpack("I", file.read(4))[0]
while attr_count:
print "---"
file.seek(4, 1)
counter = int(struct.unpack("I", file.read(4))[0])
print file.read(counter)
attr_count -= 1
file.close()

Of course, this doesn't work at all. It produces:

GDE
2
---

---
*

I'm completely at a loss. If anyone could show me the correct way to
do this (or at least point me in the right direction), I'd be
extremely grateful.
 
Reply With Quote
 
 
 
 
Jon Clements
Guest
Posts: n/a
 
      09-10-2008
On 10 Sep, 18:14, Aaron Scott <(E-Mail Removed)> wrote:
> I've been trying to tackle this all morning, and so far I've been
> completely unsuccessful. I have a binary file that I have the
> structure to, and I'd like to read it into Python. It's not a
> particularly complicated file. For instance:
>
> signature * char[3] * * "GDE"
> version * * uint32 * * *2
> attr_count *uint32
> {
> * * attr_id * * * * uint32
> * * attr_val_len * *uint32
> * * attr_val * * * *char[attr_val_len]
>
> } ... repeated attr_count times ...
>
> However, I can't find a way to bring it into Python. This is my code
> -- which I know is definitely wrong, but I had to start somewhere:
>
> import struct
> file = open("test.gde", "rb")
> output = file.read(3)
> print output
> version = struct.unpack("I", file.read(4))[0]
> print version
> attr_count = struct.unpack("I", file.read(4))[0]
> while attr_count:
> * * * * print "---"
> * * * * file.seek(4, 1)
> * * * * counter = int(struct.unpack("I", file.read(4))[0])
> * * * * print file.read(counter)
> * * * * attr_count -= 1
> file.close()
>
> Of course, this doesn't work at all. It produces:
>
> GDE
> 2
> ---
>
> ---
> *
>
> I'm completely at a loss. If anyone could show me the correct way to
> do this (or at least point me in the right direction), I'd be
> extremely grateful.


What if we view the data as having an 11 byte header:
signature, version, attr_count = struct.unpack('3cII',
yourfile.read(11))

Then for the list of attr's:
for idx in xrange(attr_count):
attr_id, attr_val_len = struct.unpack('II', yourfile.read()
attr_val = yourfile.read(attr_val_len)


hth, or gives you a pointer anyway
Jon.


 
Reply With Quote
 
 
 
 
Jon Clements
Guest
Posts: n/a
 
      09-10-2008
On 10 Sep, 18:33, Jon Clements <(E-Mail Removed)> wrote:
> On 10 Sep, 18:14, Aaron Scott <(E-Mail Removed)> wrote:
>
>
>
> > I've been trying to tackle this all morning, and so far I've been
> > completely unsuccessful. I have a binary file that I have the
> > structure to, and I'd like to read it into Python. It's not a
> > particularly complicated file. For instance:

>
> > signature * char[3] * * "GDE"
> > version * * uint32 * * *2
> > attr_count *uint32
> > {
> > * * attr_id * * * * uint32
> > * * attr_val_len * *uint32
> > * * attr_val * * * *char[attr_val_len]

>
> > } ... repeated attr_count times ...

>
> > However, I can't find a way to bring it into Python. This is my code
> > -- which I know is definitely wrong, but I had to start somewhere:

>
> > import struct
> > file = open("test.gde", "rb")
> > output = file.read(3)
> > print output
> > version = struct.unpack("I", file.read(4))[0]
> > print version
> > attr_count = struct.unpack("I", file.read(4))[0]
> > while attr_count:
> > * * * * print "---"
> > * * * * file.seek(4, 1)
> > * * * * counter = int(struct.unpack("I", file.read(4))[0])
> > * * * * print file.read(counter)
> > * * * * attr_count -= 1
> > file.close()

>
> > Of course, this doesn't work at all. It produces:

>
> > GDE
> > 2
> > ---
> >
> > ---
> > *

>
> > I'm completely at a loss. If anyone could show me the correct way to
> > do this (or at least point me in the right direction), I'd be
> > extremely grateful.

>
> What if we view the data as having an 11 byte header:
> signature, version, attr_count = struct.unpack('3cII',
> yourfile.read(11))
>
> Then for the list of attr's:
> for idx in xrange(attr_count):
> * * attr_id, attr_val_len = struct.unpack('II', yourfile.read()
> * * attr_val = yourfile.read(attr_val_len)
>
> hth, or gives you a pointer anyway
> Jon.


CORRECTION: '3cII' should be '3sII'.

 
Reply With Quote
 
Aaron Scott
Guest
Posts: n/a
 
      09-10-2008
> signature, version, attr_count = struct.unpack('3cII',
> yourfile.read(11))
>


This line is giving me an error:

Traceback (most recent call last):
File "test.py", line 19, in <module>
signature, version, attr_count = struct.unpack('3cII',
file.read(12))
ValueError: too many values to unpack
 
Reply With Quote
 
Aaron Scott
Guest
Posts: n/a
 
      09-10-2008
> CORRECTION: '3cII' should be '3sII'.

Even with the correction, I'm still getting the error.
 
Reply With Quote
 
Jon Clements
Guest
Posts: n/a
 
      09-10-2008
On Sep 10, 6:45*pm, Aaron Scott <(E-Mail Removed)> wrote:
> > CORRECTION: '3cII' should be '3sII'.

>
> Even with the correction, I'm still getting the error.


Me being silly...

Quick fix:
signature = file.read(3)
then the rest can stay the same, struct.calcsize('3sII') expects a 12
byte string, whereby you only really have 11 -- alignment and all
that...

Jon.
 
Reply With Quote
 
Aaron Scott
Guest
Posts: n/a
 
      09-10-2008
Sorry, I had posted the wrong error. The error I am getting is:

struct.error: unpack requires a string argument of length 12

which doesn't make sense to me, since I'm specifically asking for 11.
Just for kicks, if I change the line to

print struct.unpack('3sII', file.read(12))

I get the result

('GDE', 33554432, 16777216)

.... which isn't even close, past the first three characters.
 
Reply With Quote
 
Aaron Scott
Guest
Posts: n/a
 
      09-10-2008
Taking everything into consideration, my code is now:

import struct
file = open("test.gde", "rb")
signature = file.read(3)
version, attr_count = struct.unpack('II', file.read()
print signature, version, attr_count
for idx in xrange(attr_count):
attr_id, attr_val_len = struct.unpack('II', file.read()
attr_val = file.read(attr_val_len)
print attr_id, attr_val_len, attr_val
file.close()

which gives a result of:

GDE 2 2
1 4
2 4 *

Essentially, the same results I was originally getting
 
Reply With Quote
 
Roel Schroeven
Guest
Posts: n/a
 
      09-10-2008
Aaron Scott schreef:
> Sorry, I had posted the wrong error. The error I am getting is:
>
> struct.error: unpack requires a string argument of length 12
>
> which doesn't make sense to me, since I'm specifically asking for 11.


That's because of padding. According to the docs, "By default, C numbers
are represented in the machine's native format and byte order, and
properly aligned by skipping pad bytes if necessary (according to the
rules used by the C compiler)". That means that struct.unpack() assumes
one byte of padding between the 3-character string and the first
unsigned int.

--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
-- Isaac Asimov

Roel Schroeven
 
Reply With Quote
 
Jon Clements
Guest
Posts: n/a
 
      09-10-2008
On Sep 10, 7:16*pm, Aaron Scott <(E-Mail Removed)> wrote:
> Taking everything into consideration, my code is now:
>
> import struct
> file = open("test.gde", "rb")
> signature = file.read(3)
> version, attr_count = struct.unpack('II', file.read()
> print signature, version, attr_count
> for idx in xrange(attr_count):
> * * * * attr_id, attr_val_len = struct.unpack('II', file.read()
> * * * * attr_val = file.read(attr_val_len)
> * * * * print attr_id, attr_val_len, attr_val
> file.close()
>
> which gives a result of:
>
> GDE 2 2
> 1 4
> 2 4 *
>
> Essentially, the same results I was originally getting


Umm, how about yourfile.read(100)[or some arbitary value, just to see
the data) and see what it returns... does it return something that
looks like values you'd expect in a char[]... I also find it odd that
the attr_val_len appears to be 4?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with reading binary data files Dimitri Papoutsis C++ 4 03-11-2005 02:03 AM
Re: how to reading binary data... marcos@nordesia.org Python 1 10-21-2004 10:59 AM
Suggestions for reading binary data from a connected socket. My Name Java 9 07-21-2004 05:40 PM
Reading binary data from file Brad Marts C++ 1 12-08-2003 09:14 PM
Advice needed: reading image (binary) data from a db, to be placed in an Image control ?? Denise Smith ASP .Net 2 11-22-2003 02:18 PM



Advertisments