Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Binary storage of string constants

Reply
Thread Tools

Binary storage of string constants

 
 
Ross
Guest
Posts: n/a
 
      06-29-2006
Suppose I define a char* variable as follows:

char *s = "€";

What actually gets put into the binary? Presumably, it gets stored in
the encoding of the source file. Am I right? Or is it compiler/platform
dependent?

The C spec suggests that string constants get mapped in an
implementation-defined manner to members of the execution character
set. Does this mean that some compilers perform iconv-esque conversion
between the source and execution character sets at runtime? If so, does
this mean the result of strlen(s) may vary depending on the execution
character set?

Thanks in advance.

 
Reply With Quote
 
 
 
 
Chris Dollin
Guest
Posts: n/a
 
      06-29-2006
Richard Heathfield wrote:

> Ross said:
>
>> Suppose I define a char* variable as follows:
>>
>> char *s = "€";

>
> I don't know what you wrote there, but on my display I can see a little
> white square.


Looks like a Euro here in this newsreader (knode). And when I tried pasting
it into a command window (konsole), it seemed to become a zero-width
character - and backspacing rubbed out a space in the command prompt!

--
Chris "nice icon for a planeship flying left" Dollin
A rock is not a fact. A rock is a rock.

 
Reply With Quote
 
 
 
 
Richard Heathfield
Guest
Posts: n/a
 
      06-29-2006
Ross said:

> Suppose I define a char* variable as follows:
>
> char *s = "€";


I don't know what you wrote there, but on my display I can see a little
white square. (Just a quick tip: use const char * when pointing at string
literals.)

> What actually gets put into the binary?


It depends. The value might not even make it into the binary, depending on
whether s gets used. But typically the coding point of the character will
appear in the binary somewhere.

> Presumably, it gets stored in
> the encoding of the source file. Am I right? Or is it compiler/platform
> dependent?


Very much so.

<snip>

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
 
Reply With Quote
 
Ross
Guest
Posts: n/a
 
      06-29-2006
Yeah, it was supposed to be a Euro symbol.

Any idea what happens at runtime, then? Is it possible that the string
gets converted into the execution chacacter set, or will it just remain
'as is' in the source character set? Does the same apply to character
constants?

 
Reply With Quote
 
Richard Heathfield
Guest
Posts: n/a
 
      06-29-2006
Chris Dollin said:

> Richard Heathfield wrote:
>
>> Ross said:
>>
>>> Suppose I define a char* variable as follows:
>>>
>>> char *s = "€";

>>
>> I don't know what you wrote there, but on my display I can see a little
>> white square.

>
> Looks like a Euro here in this newsreader (knode).


I'm using knode too. Perhaps Euros look like little white squares. (I must
admit I thought they were triangular rubber coins 6800 miles on a side, but
I've never actually seen one, so I could be wrong about that.)

> And when I tried
> pasting it into a command window (konsole), it seemed to become a
> zero-width character - and backspacing rubbed out a space in the command
> prompt!


Oopsie. If I were you, I'd sue the OP for breach of command prompt.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
 
Reply With Quote
 
Frederick Gotham
Guest
Posts: n/a
 
      06-29-2006
Richard Heathfield posted:


>> Looks like a Euro here in this newsreader (knode).

>
> I'm using knode too. Perhaps Euros look like little white squares. (I
> must admit I thought they were triangular rubber coins 6800 miles on a
> side, but I've never actually seen one, so I could be wrong about
> that.)



For anyone who's interested:

http://www.joerch.org/coins/euro-r.html



--

Frederick Gotham
 
Reply With Quote
 
Skarmander
Guest
Posts: n/a
 
      06-29-2006
Ross wrote:
> Suppose I define a char* variable as follows:
>
> char *s = "€";
>
> What actually gets put into the binary?


You don't know.

> Presumably, it gets stored in the encoding of the source file. Am I
> right?


No. The encoding of the source file is in principle completely immaterial to
whatever the compiler output is. Theoretically, the compiler could even
produce code that "computes" your strings just-in-time, so there aren't any
characters in the binary at all.

> Or is it compiler/platform dependent?
>

Yes.

> The C spec suggests that string constants get mapped in an
> implementation-defined manner to members of the execution character
> set. Does this mean that some compilers perform iconv-esque conversion
> between the source and execution character sets at runtime?


The compiler isn't allowed to do that. Mapping of characters int the source
character set to the execution character set takes place at translation
time. "Implementation-defined" just means that the way of mapping has to be
documented.

> If so, does this mean the result of strlen(s) may vary depending on the
> execution character set?
>

Only insofar as the results of strlen() depend on the execution character
set used at translation (which is when the mapping from source character set
to execution character set happens).

When strlen() gets around, all that's left are characters stored in bytes.
strlen() counts these characters, which is the same as the number of bytes
they occupy. The result of strlen() on "the same" string may therefore vary
with platform, and even with compilation on the same platform, but not with
execution of the same translated program.

S.
 
Reply With Quote
 
Ross
Guest
Posts: n/a
 
      06-30-2006
Does this mean that 'execution character set' is referring to the
execution of the compiler, rather than the execution of the compiled
program (as I had assumed)?

 
Reply With Quote
 
Ross
Guest
Posts: n/a
 
      06-30-2006
Scrub that. I'm pretty sure that 'execution character set' is referring
to the execution of the compiled program. I guess the real question
should be: what is meant by 'translation time'? If it's synonymous with
'compilation time', how can the compiler know what the execution
character set is going to be? Surely this depends on the locale of the
system on which the compiled program is executed?

 
Reply With Quote
 
Ross
Guest
Posts: n/a
 
      06-30-2006
OK, I see where I'm going wrong. The execution character set is fixed
at compile time and is in no way affected by the locale of the system
in which the binary is executed.

Thanks to one and all.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why is enterprise storage so much more expensive than personal storage? John Computer Support 4 03-17-2008 09:50 PM
Binary constants valerij C++ 11 12-11-2006 07:35 AM
How to access the external storage unit of storage router =?Utf-8?B?SWduYXRpdXM=?= Wireless Networking 4 11-06-2006 06:40 AM
Difference b/w storage class and storage class specifier sarathy C Programming 2 07-17-2006 05:06 PM
Does any one recognize this binary data storage format geskerrett@hotmail.com Python 27 10-05-2005 08:24 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57