Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Unpacking signed shorts and integers with specified endianness

Reply
Thread Tools

Unpacking signed shorts and integers with specified endianness

 
 
Phrogz
Guest
Posts: n/a
 
      06-18-2007
I'm deciphering a byte-based binary file structure which can be stored
in either big or little endian. (The first byte of the file describes
the format.) Among others, some of the fields are UINT32 (unsigned 4-
byte integers), and some are INT32 (signed 4-byte integers).

I know about BitStruct, but was trying to roll my own solution using
String.unpack. (The file format has nested and repeating sections in
it.) Given that I know exactly how many bytes I want for each field
and whether the file is big or little endian, I thought it would be
easy to pick a specific unpack character for each field. However,
reading through the String.unpack docs, I can't find something that
corresponds to signed 2-byte and 4-byte integers for a given
endianness.

Here's what I see (ASCII art ahead):
| signed | unsigned |
bytes | big | little | big | little |
------+-------+--------+-------+--------+
1 | c | C |
2 | ? | ? | n | v |
4 | ? | ? | N | V |

1) I see "l" (lowercase L) which is 4 bytes treated as a signed
integer...but in 'native' endian order. Does String.unpack not provide
a way to unpack a 4-byte signed integer with a specified endianness?

2) I see "s" which is 2 bytes treated as a signed integer...but in
'native' endian order. Does String.unpack not provide a way to unpack
a 2-byte signed integer with a specified endianness?

(The file format doesn't actually use any signed 2-byte integers, but
I wanted to include them for completeness.)

 
Reply With Quote
 
 
 
 
Phrogz
Guest
Posts: n/a
 
      06-19-2007
On Jun 18, 4:32 pm, Phrogz <(E-Mail Removed)> wrote:
> 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
> integer...but in 'native' endian order. Does String.unpack not provide
> a way to unpack a 4-byte signed integer with a specified endianness?
>
> 2) I see "s" which is 2 bytes treated as a signed integer...but in
> 'native' endian order. Does String.unpack not provide a way to unpack
> a 2-byte signed integer with a specified endianness?


Alright, let's try asking this question a different way. Given a
binary file spec like the following, where the first byte tells you
whether to interpret the rest of the file as little- or big-endian,
and given the presence of a signed integer in the file, how would you
write a parser for this?

endian UINT8 (1==little, 0==big)
job_count UINT8 # of repeated binary sections following this
(jobs) JOB*

Each JOB section is:
foo UINT8
bar UINT16
jim UINT32
jam INT32
jill UINT8

gib_mark 0xdead # 2 bytes
gib_count UINT16 # of repeated binary sections following this
(gibs) GIB+

Each GIB section is:
gob 8 chars
blurb UINT8


Would you invent a new character for String.unpack, split the string
around it, and use knowledge of the native endianness of the platform
you're running on to decide whether to pull out those 4 bytes
independently, reverse them, and then unpack the result as a signed
integer?

Is there some sweet trick you could do after extracting 4 bytes as an
integer to switch the implied interpreted endianness?

Would you patch String.unpack in C to add options for specific-endian
signed shorts and integers?

Can you easily do the above (including the repeating sub-binary
sections) with BitStruct?

 
Reply With Quote
 
 
 
 
Daniel Berger
Guest
Posts: n/a
 
      06-19-2007


On Jun 18, 4:35 pm, Phrogz <(E-Mail Removed)> wrote:

<snip>

> 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
> integer...but in 'native' endian order. Does String.unpack not provide
> a way to unpack a 4-byte signed integer with a specified endianness?
>
> 2) I see "s" which is 2 bytes treated as a signed integer...but in
> 'native' endian order. Does String.unpack not provide a way to unpack
> a 2-byte signed integer with a specified endianness?


See the 'N', 'n', 'V' and 'v' directives. There are equivalent
directives for floats as well - 'E', 'e', 'G' and 'g'.

Regards,

Dan


 
Reply With Quote
 
Mark Day
Guest
Posts: n/a
 
      06-19-2007
On Jun 19, 2007, at 11:19 AM, Daniel Berger wrote:

>> 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
>> integer...but in 'native' endian order. Does String.unpack not
>> provide
>> a way to unpack a 4-byte signed integer with a specified endianness?
>>
>> 2) I see "s" which is 2 bytes treated as a signed integer...but in
>> 'native' endian order. Does String.unpack not provide a way to unpack
>> a 2-byte signed integer with a specified endianness?

>
> See the 'N', 'n', 'V' and 'v' directives. There are equivalent
> directives for floats as well - 'E', 'e', 'G' and 'g'.


Those handle endianness, but not signed values. I suppose you could
unpack as unsigned, then manually test for the sign bit being set and
correct the value. Even uglier, you could unpack as unsigned with
desired endianness, repack as unsigned in native order, then unpack as
signed in native order.

-Mark


 
Reply With Quote
 
Joel VanderWerf
Guest
Posts: n/a
 
      06-24-2007
Mark Day wrote:
> On Jun 19, 2007, at 11:19 AM, Daniel Berger wrote:
>
>>> 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
>>> integer...but in 'native' endian order. Does String.unpack not provide
>>> a way to unpack a 4-byte signed integer with a specified endianness?
>>>
>>> 2) I see "s" which is 2 bytes treated as a signed integer...but in
>>> 'native' endian order. Does String.unpack not provide a way to unpack
>>> a 2-byte signed integer with a specified endianness?

>>
>> See the 'N', 'n', 'V' and 'v' directives. There are equivalent
>> directives for floats as well - 'E', 'e', 'G' and 'g'.

>
> Those handle endianness, but not signed values. I suppose you could
> unpack as unsigned, then manually test for the sign bit being set and
> correct the value. Even uglier, you could unpack as unsigned with
> desired endianness, repack as unsigned in native order, then unpack as
> signed in native order.


What bit-struct does in these cases is the first kind of ugly:

# Let's say we start with a negative number packed in
# 16 bits, big-endian:
x = -123
s = [x].pack("n")

# Note that the sign is not packed with the number. It packs to the
# same chars as 2**16 + x

bits = 16
max_unsigned = 2 ** bits
max_signed = 2 ** (bits - 1)
to_signed = proc { |n| (n >= max_signed) ? n - max_unsigned : n }

puts to_signed[s.unpack("n").first] # ==> -123

(This has come up a few times on the list -- search for "to_signed", for
example.)

It's still a hack, though, and I'd like to see Gavin's RCR go through,
if the naming issues can be resolved.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Packing and unpacking unsigned integers of arbitrary size as binarystrings Aaron D. Gifford Ruby 3 04-07-2011 04:30 PM
Abercrombie Fitch Beach Shorts - Abercrombie Fitch Mens Shorts gosee C Programming 0 06-28-2009 08:57 AM
comparing signed and unsigned integers Joe Van Dyk C Programming 3 06-25-2006 09:11 PM
operator % and signed integers Thomas Matthews C++ 8 12-28-2005 05:33 PM
operator % and signed integers Thomas Matthews C Programming 9 12-28-2005 05:33 PM



Advertisments