Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > standard doubt

Reply
Thread Tools

standard doubt

 
 
Seebs
Guest
Posts: n/a
 
      12-10-2010
On 2010-12-10, raffamaiden <(E-Mail Removed)> wrote:
> Hi all.


Hi!

You should be aware that the word "doubt", in English, has the connotation
that you were told something but disbelieve it. If you want to express
more general uncertainty, or mere lack of information, use a different
word. "Question" would probably be the best choice for something like
this, because you're asking a question.

> int a =5;
> fwrite(&a, sizeof(int), 1, my_file_ptr);


> This will write an int to the file pointed by my_file_ptr. But i know
> that the c standard does not specify the exact size in bytes for its
> primitive type, as far as i know it only specifies that and int is an
> integer type that rapresents a number with a sign, but different
> implementations\operating systems can have different size for an int.


Yes.

> So this mean that my program will write a 32 bit integer with one
> implementation and a 16 bit with another implementation. This would
> also mean that the file generated by the program that is running in
> one implementation will not be readable in another implementation,
> unless the program knows also in which implementation the instance
> that generated the file was running.


Yes.

> That is right? I do not want such a behavior. How can i solve this?


By writing something other than raw binary native types. One option would
be to pick a standard textual representation; if performance and file
size aren't a big deal, this is almost always the best choice, because it's
easy to read and debug. If that doesn't work, there are a large number
of options out there. You might find it instructive to look at code for
something like the TIFF image file format, which is quite successfully
portable across a broad range of machines.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
 
 
 
Seebs
Guest
Posts: n/a
 
      12-10-2010
On 2010-12-10, raffamaiden <(E-Mail Removed)> wrote:
> I like this solution, but I have few more questions: Is 'char'
> guaranteed to be exactly 1 byte by the standard? Because if not that
> would not make sense.


Well, the good news is, yes, 'char' is defined to be exactly 1 byte.

The bad news is, that's because the standard defines the word "byte" to
mean "the size of a char". It does *not* guarantee that either byte or
char means 8 bits exactly.

> Also another question: in your example my integer, for which i use
> only the last 32-bits whatever its size in memory is, would be in the
> variable "val". How about encoding? Does the C standard (let it be
> C90) specify how an integer should be encoded in memory, or not?


No.

> If
> not, suppose i'm running my program in two implementations A and B. A
> use two's complement encoding, while B use sign and magnitude. So if A
> save the integer with the above c code, it will save exactly 32 bits,
> but B will recognize another number because it use another encoding in
> the same 32-bits.


Not with the code given.

> Also, shouldn't buf[0] be buf[x]?


Yes.

The key is that "& 0xFF" always gives you the bottom 8 bits of value,
regardless of representation. So if you start out with a number
which has the value 0x12345678, it doesn't matter whether that's stored
in memory as { 12, 34, 56, 78 } or { 78, 56, 34, 12 }. Either way,
val & 0xFF will be 0x78, and val >>=8 will convert it to 0x123456, and
the next loop will get the 0x56.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
 
 
 
Seebs
Guest
Posts: n/a
 
      12-10-2010
On 2010-12-10, Morris Keesan <(E-Mail Removed)> wrote:
> Yes, but it doesn't guarantee that "byte" means what you think it does.
> The standard requires a byte to be *at least* 8 bytes.


Bits.

(Obvious in context, to be sure.)

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
Morris Keesan
Guest
Posts: n/a
 
      12-10-2010
On Fri, 10 Dec 2010 13:36:50 -0500, Seebs <(E-Mail Removed)> wrote:

> On 2010-12-10, Morris Keesan <(E-Mail Removed)> wrote:
>> Yes, but it doesn't guarantee that "byte" means what you think it does.
>> The standard requires a byte to be *at least* 8 bytes.

>
> Bits.
>
> (Obvious in context, to be sure.)


D'oh! Thanks. That'll teach me to post when ... uh ... when I don't have
any excuse for making dumb typos.
--
Morris Keesan -- (E-Mail Removed)
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      12-10-2010
On 12/11/10 10:05 AM, Keith Thompson wrote:

Wow, Keith's posting from the future!

> Mark Storkamp<(E-Mail Removed)> writes:
> [...]
>> As others have said, the better solution may be to define the format of
>> your file handle all reasonable variations. I recently took another
>> approach when I needed to work with the very poorly designed .stl file
>> format. I needed to have 2 byte unsigned, 4 byte unsigned, 4 byte floats
>> and 50 byte structures, and I needed to compile and run on Windows, Mac
>> and Unix. At the start of the program I have lines such as:
>>
>> assert(sizeof(unsigned) == 4);
>>
>> Then if the asserts fail, I can adjust compiler switches in my makefile
>> accordingly.

>
> If your implementation has<stdint.h>, you might be better
> off using uint32_t rather than unsigned. And if it doesn't,
> there are ways to define it yourself; see, for example,
> <http://www.lysator.liu.se/c/q8/index.html>.
>
> Note also that sizeof(unsigned)==4 could be true on a system with 64-bit
> unsigned and 16-bit char (though this is unlikely). You might add
>
> assert(CHAR_BIT == ;
>
> or, since CHAR_BIT is a compile-time constant:
>
> #if CHAR_BIT != 8
> #error "CHAR_BIT != 8"
> #endif


sizeof(unsigned) or sizeof(anything) is also a compile time constant, so
it can be used in compile time checks:

const unsigned test = 1/(sizeof(long) == 4);

--
Ian Collins
 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      12-10-2010
On Fri, 10 Dec 2010 06:17:49 -0800, raffamaiden wrote:

> So this mean that my program will write a 32 bit integer with one
> implementation and a 16 bit with another implementation. This would
> also mean that the file generated by the program that is running in
> one implementation will not be readable in another implementation,
> unless the program knows also in which implementation the instance
> that generated the file was running.
> That is right? I do not want such a behavior. How can i solve this?


Aside from the issue of the precise format: if a system with 32-bit
integers writes an integer larger than 16 bits to the file, what are you
going to do when reading the file on a system with 16-bit integers?

Or if a system with 32-bit two's complement integers writes -2147483648 to
the file, what are you going to do when reading the file on a system using
sign-bit representation, where the most negative representable integer is
-2147483647?

Sometimes, it's simply not worth the trouble of accomodating anything
beyond "typical" systems. If you assume 32-bit two's complement integers,
your code will work on 99.99% of systems in current use. Additionally
assuming little-endian representation won't reduce that by much.

It's almost impossible to write a non-trivial program using nothing beyond
the C standard, so any new platform will require some degree of porting.
Assuming common behaviour simply means that porting to "unusual" platforms
will require more work *if and when* you actually port to such platforms.

BTW: a more significant issue than either word size or endianness is
alignment. Assuming support for unaligned reads will result in code which
doesn't work on many ARM CPUs, and there are more of those in use than
x86.

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      12-10-2010
raffamaiden <(E-Mail Removed)> writes:
> First off, thanks for the answers.
>
>> *You have to define your file format. They are severals...
>> *You can choose text-oriented, or binary ones.
>> *There is no 'best' solution. It depends on you needs.

>
> I want to use a binary file format because i feel the final file will
> be smaller and doesn't require atoi() and other stuff to retrieve the
> data.


If you use a binary file format, either you'll have to define the
exact format (and translate to and from that format when accessing
the file), or you'll have to give up on being able to read the file
on other systems. Straight binary (fwrite'ing structs directly,
for example) can make sense for files that will be used *only* by the
same program on the same system.

Text is far more portable, and you may find that the space and time
overhead of using text rather than binary isn't that much of an issue.

> An integer should be stored as 32-bit.


Why? I'm not saying you're wrong, but why 32 bits in particular?

See <stdint.h> for definitions of types of particular sizes. uint32_t
might be the best thing for your purposes, at least if you don't need
negative values.

>>Serialize your data... e.g.
>>If you know you're using 32-bits of the data
>>
>>unsigned char buf[4];
>>for (int x = 0; x < 4; x++) {
>> buf[0] = val & 0xFF;
>> val >>= 8;
>>}
>>
>>outlen = fwrite(buf, 1, 4, outfile);
>>
>>Of course smarter would be to write a function to store 32-bit ints to
>>a FILE then just call it when needed...

>
> I like this solution, but I have few more questions: Is 'char'
> guaranteed to be exactly 1 byte by the standard? Because if not that
> would not make sense.


As others have said, C defines a "byte" as the size of a char object,
which is *at least* 8 bits. (You'll see the word "byte" with other
meanings in other contexts.) If you're not dealing with DSPs and
embedded systems, you can probably get away with assuming that a byte is
8 bits -- but I suggest making the assumption explicit:

#include <limits.h>
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
/* Now we can safely assume that bytes are 8 bits.

> Also another question: in your example my integer, for which i use
> only the last 32-bits whatever its size in memory is, would be in the
> variable "val". How about encoding? Does the C standard (let it be
> C90) specify how an integer should be encoded in memory, or not? If
> not, suppose i'm running my program in two implementations A and B. A
> use two's complement encoding, while B use sign and magnitude. So if A
> save the integer with the above c code, it will save exactly 32 bits,
> but B will recognize another number because it use another encoding in
> the same 32-bits.


C permits signed integers to be stored in 2's-complement,
1s'-complement, or sign-and-magnitude. (That's C99; C90 was
less specific, but I don't think you'll find an implementation
that uses anything else.) The vast majority of modern systems
use 2's-complement. but if you only write unsigned values to
files you can avoid that. If you need negative integers, you can
either define your own file format or just assume a 2's-complement
representation; the latter is less portable, but unlikely to be a
problem in practice.

Byte order is another issue; google "endianness" for more
information. POSIX provides byte-order conversion functions
(htonl et al); depending on POSIX further reduces portability,
but not drastically so.

[...]

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      12-10-2010
Mark Storkamp <(E-Mail Removed)> writes:
[...]
> As others have said, the better solution may be to define the format of
> your file handle all reasonable variations. I recently took another
> approach when I needed to work with the very poorly designed .stl file
> format. I needed to have 2 byte unsigned, 4 byte unsigned, 4 byte floats
> and 50 byte structures, and I needed to compile and run on Windows, Mac
> and Unix. At the start of the program I have lines such as:
>
> assert(sizeof(unsigned) == 4);
>
> Then if the asserts fail, I can adjust compiler switches in my makefile
> accordingly.


If your implementation has <stdint.h>, you might be better
off using uint32_t rather than unsigned. And if it doesn't,
there are ways to define it yourself; see, for example,
<http://www.lysator.liu.se/c/q8/index.html>.

Note also that sizeof(unsigned)==4 could be true on a system with 64-bit
unsigned and 16-bit char (though this is unlikely). You might add

assert(CHAR_BIT == ;

or, since CHAR_BIT is a compile-time constant:

#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      12-10-2010
Nobody <(E-Mail Removed)> writes:
> On Fri, 10 Dec 2010 06:17:49 -0800, raffamaiden wrote:
>
>> So this mean that my program will write a 32 bit integer with one
>> implementation and a 16 bit with another implementation. This would
>> also mean that the file generated by the program that is running in
>> one implementation will not be readable in another implementation,
>> unless the program knows also in which implementation the instance
>> that generated the file was running.
>> That is right? I do not want such a behavior. How can i solve this?

>
> Aside from the issue of the precise format: if a system with 32-bit
> integers writes an integer larger than 16 bits to the file, what are you
> going to do when reading the file on a system with 16-bit integers?
>
> Or if a system with 32-bit two's complement integers writes -2147483648 to
> the file, what are you going to do when reading the file on a system using
> sign-bit representation, where the most negative representable integer is
> -2147483647?


Good points.

> Sometimes, it's simply not worth the trouble of accomodating anything
> beyond "typical" systems. If you assume 32-bit two's complement integers,
> your code will work on 99.99% of systems in current use. Additionally
> assuming little-endian representation won't reduce that by much.


If you assume that *some* predefined signed integer type is 32-bit two's
complement, that's probably ok for the vast majority of current
(non-embedded) systems. Assuming that "int" is such a type is unwise
and unnecessary.

I certainly wouldn't assume little-endian representation for anything to
be shared with different systems. x86 happens to be dominant today, but
there's not guarantee that it always will be; there are still a
significant number of SPARC systems out there. And it's a solvable
problem anyway; you don't *have* to depend on a particular endianness.
(This is why "network byte order" exists.)

> It's almost impossible to write a non-trivial program using nothing beyond
> the C standard, so any new platform will require some degree of
> porting.


It depends on what you're doing. If you're just reading and writing
files, you really don't need to rely on anything beyond the C standard.

> Assuming common behaviour simply means that porting to "unusual" platforms
> will require more work *if and when* you actually port to such platforms.
>
> BTW: a more significant issue than either word size or endianness is
> alignment. Assuming support for unaligned reads will result in code which
> doesn't work on many ARM CPUs, and there are more of those in use than
> x86.


Alignment is an issue only for in-memory data; it's irrelevant for
reading and writing files.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      12-10-2010
Ian Collins <(E-Mail Removed)> writes:
> On 12/11/10 10:05 AM, Keith Thompson wrote:
> Wow, Keith's posting from the future!


You'll love the flying cars!

[...]

>> Note also that sizeof(unsigned)==4 could be true on a system with 64-bit
>> unsigned and 16-bit char (though this is unlikely). You might add
>>
>> assert(CHAR_BIT == ;
>>
>> or, since CHAR_BIT is a compile-time constant:
>>
>> #if CHAR_BIT != 8
>> #error "CHAR_BIT != 8"
>> #endif

>
> sizeof(unsigned) or sizeof(anything) is also a compile time constant,
> so it can be used in compile time checks:
>
> const unsigned test = 1/(sizeof(long) == 4);


True -- but it's not visible to the preprocessor, so you can't use
it in #if.

(Back in 1998 in comp.std.c, somebody remarked that it was nice back in
the days when sizeof could be used in #if directives. A followup said
"Must have been before my time". The followup was from Dennis Ritchie.)

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dotnet doubt can any body clarify my doubt challa462@gmail.com ASP .Net 0 08-22-2012 06:02 AM
doubt about doubt Bob Nelson C Programming 11 07-30-2006 08:17 PM
add pexpect to the standard library, standard "install" mechanism. funkyj Python 5 01-20-2006 08:35 PM
How standard is the standard library? steve.leach Python 1 04-18-2005 04:07 PM
Coding statements before declaration-a doubt on standard!!! s.subbarayan C Programming 2 09-10-2004 11:28 AM



Advertisments