Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > using character as array subscript

Reply
Thread Tools

using character as array subscript

 
 
Ivan
Guest
Posts: n/a
 
      06-16-2008
Hi,

What is the best syntax to use a char to index into an array.

///////////////////////////////////
For example

int data[256];

data['a'] = 1;
data['b'] = 1;
///////////////////////////////////

gcc is complaining about this syntax, so i am using static cast on the
character literal. Is there a better way to do this?

Thanks,
Ivan
 
Reply With Quote
 
 
 
 
Jim Langston
Guest
Posts: n/a
 
      06-16-2008
"Ivan" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi,
>
> What is the best syntax to use a char to index into an array.
>
> ///////////////////////////////////
> For example
>
> int data[256];
>
> data['a'] = 1;
> data['b'] = 1;
> ///////////////////////////////////
>
> gcc is complaining about this syntax, so i am using static cast on the
> character literal. Is there a better way to do this?


MSVC++ 2008 express isn't complaining and compiles that code fine, not even
a warning. It is well defined behavior as long as the type of your native
char is unsigned 8 bit byte.

On my system if I
std::cout << typeid('a').name() < "\n";
I get the output of
char

Not unsigned char. That may produce some undefined behavior for you if you
attempt to work with characters that would be above 127 as a byte, they
might show up negative.


 
Reply With Quote
 
 
 
 
Daniel Pitts
Guest
Posts: n/a
 
      06-17-2008
Jim Langston wrote:
> "Ivan" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> Hi,
>>
>> What is the best syntax to use a char to index into an array.
>>
>> ///////////////////////////////////
>> For example
>>
>> int data[256];
>>
>> data['a'] = 1;
>> data['b'] = 1;
>> ///////////////////////////////////
>>
>> gcc is complaining about this syntax, so i am using static cast on the
>> character literal. Is there a better way to do this?

>
> MSVC++ 2008 express isn't complaining and compiles that code fine, not even
> a warning. It is well defined behavior as long as the type of your native
> char is unsigned 8 bit byte.
>
> On my system if I
> std::cout << typeid('a').name() < "\n";
> I get the output of
> char
>
> Not unsigned char. That may produce some undefined behavior for you if you
> attempt to work with characters that would be above 127 as a byte, they
> might show up negative.
>
>

Is it well defined? I thought it would depend on the character encoding
used, such as ASCII vs EBCDIC. Or does the standard actually specify
char encoding now?

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-17-2008
On Jun 17, 12:48 am, Ivan <(E-Mail Removed)> wrote:

> What is the best syntax to use a char to index into an array.


It depends.

> ///////////////////////////////////
> For example


> int data[256];


> data['a'] = 1;
> data['b'] = 1;
> ///////////////////////////////////


> gcc is complaining about this syntax, so i am using static
> cast on the character literal. Is there a better way to do
> this?


It depends on the context.

First, this is a warning; you can turn it off, or ignore it. In
fact, it is a legitimate warning unless you've taken adequate
precautions; a char may have negative values. (But then, so may
an int. Logically, g++ shouldn't warn unless the size of the
array is such that not all entries can be reached by a char, and
not in the case of a character literal, in any case. But in
fact, it does always warn, unless you turn that warning off.)

The first case is when the array will normally be indexed by an
int, and you're just using character literals during
initialization; if the only indexation by a char is with a
character literal, you can simply ignore the warning. (Note
that this is a more or less usual idiom: you read the array with
a return value of istream::get(), for example, after having
checked for EOF.)

If you really do want to index with arbitrary characters, there
are three solutions:

1. If portability isn't a large concern, you can just compile
with -funsigned-char. This should really be the default,
but there are historical reasons which mean that it isn't.
Other compilers also have such an option. (It's /J for
VC++, I think.) If you're certain that you'll never have to
port to a compiler without this option, you can just use it,
and be assured that plain char is unsigned.

In this case, you'll still have to turn off the warning from
g++. (IMHO, the warning, as it is currently implemented, is
stupid. If they want to warn, it would be more reasonable
to warn when the type of the index cannot encompass all of
the possible index values, and only if the value is not a
constant.)

2. Otherwise, you can cast to unsigned_char anytime you use a
char as an index.

3. Or, you can rearrange the array, and use character -
CHAR_MIN as an index.

In the latter two cases, I'd wrap the array in a class which
took care of the "correction" of the index.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
Mirco Wahab
Guest
Posts: n/a
 
      06-17-2008
Ivan wrote:
> For example
> int data[256];
> data['a'] = 1;
> data['b'] = 1;
> ///////////////////////////////////
> gcc is complaining about this syntax, so i am using static cast on the
> character literal. Is there a better way to do this?


Which gcc? From your example, I assumed:

int data[256];

int main()
{
data['a'] = 1;
data['b'] = 1;
return 0;
}

Compiled as C++ There was not a single warning in:
g++-4.3 (-Wall -pedantic)
mingw-gcc-3.4.1
icpc (intel CC 10.1)

Maybe you made another mistake not
shown in your incomplete excerpt.

Regards

Mirco
 
Reply With Quote
 
Daniel Pitts
Guest
Posts: n/a
 
      06-17-2008
Jack Klein wrote:
> On Mon, 16 Jun 2008 18:53:05 -0700, Daniel Pitts
> <(E-Mail Removed)> wrote in comp.lang.c++:
>
>> Jim Langston wrote:
>>> "Ivan" <(E-Mail Removed)> wrote in message
>>> news:(E-Mail Removed)...
>>>> Hi,
>>>>
>>>> What is the best syntax to use a char to index into an array.
>>>>
>>>> ///////////////////////////////////
>>>> For example
>>>>
>>>> int data[256];
>>>>
>>>> data['a'] = 1;
>>>> data['b'] = 1;
>>>> ///////////////////////////////////
>>>>
>>>> gcc is complaining about this syntax, so i am using static cast on the
>>>> character literal. Is there a better way to do this?
>>> MSVC++ 2008 express isn't complaining and compiles that code fine, not even
>>> a warning. It is well defined behavior as long as the type of your native
>>> char is unsigned 8 bit byte.
>>>
>>> On my system if I
>>> std::cout << typeid('a').name() < "\n";
>>> I get the output of
>>> char
>>>
>>> Not unsigned char. That may produce some undefined behavior for you if you
>>> attempt to work with characters that would be above 127 as a byte, they
>>> might show up negative.
>>>
>>>

>> Is it well defined? I thought it would depend on the character encoding
>> used, such as ASCII vs EBCDIC. Or does the standard actually specify
>> char encoding now?

>
> No, the standard does not specify execution character set. Or source
> character set, for that matter. That's exactly why it is more
> portable to use the actual characters, rather than their numerical
> value in a particular character set.
>
> In fact, the OP's code could well be part of a beginner's assignment
> to generate a histogram of characters in some input data.
>
> This is guaranteed to produce the correct hex digit character for the
> lowest nibble of an unsigned int regardless of the character set:
>
> char hex[] = "0123456789ABCDEF";
>
> char hex_digit(unsigned int x)
> {
> return hex [x & 0xf];
> }

You're example only addresses the *converse* of my point, and therefor
doesn't have any connection to the validity of my point.
>
> ....if you change the definition of the array to:
>
> char hex [17] = { 48, 48, /*... */ 69, 70, 0 };
>
> ....then you get exactly the same array and result on an ASCII
> implementation, and gibberish on any other execution character set.
>

Right, but using 'a' as an index into an array could be a different
index on different compilers. considering that char could be signed and
negative, you could have serious consequences.

Granted, this isn't a problem in practice, but its not portable that
foo['a'] = 1 should do something specific.

Now, if you were to get specific with vendor/platform, thats a different
question.

--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
 
Reply With Quote
 
Jerry Coffin
Guest
Posts: n/a
 
      06-17-2008
In article <4857ed5a$0$12713$(E-Mail Removed)>,
http://www.velocityreviews.com/forums/(E-Mail Removed) says...

[ ... ]

> Right, but using 'a' as an index into an array could be a different
> index on different compilers. considering that char could be signed and
> negative, you could have serious consequences.
>
> Granted, this isn't a problem in practice, but its not portable that
> foo['a'] = 1 should do something specific.


That depends on what you mean by something specific. Basically, the
behavior is unspecified, but NOT undefined. In particular, the C++
standard specifies a basic execution character set that includes the
usual English letters, base-10 digits, etc. and requires that all those
characters have non-negative values. Since the 'a' in your expression
must be non-negative, it has defined results if (for example) foo has
been defined something like 'int foo[UCHAR_MAX];'

It's certainly true that you could encounter characters whose encoding
is negative, but this isn't one of them.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-18-2008
On Jun 17, 6:58 pm, Daniel Pitts
<(E-Mail Removed)> wrote:

[...]
> Right, but using 'a' as an index into an array could be a
> different index on different compilers.


Which, presumably, is what is wanted. You don't want the entry
corresponding to 97 (or whatever); you want the entry
corresponding to the encoding for the character 'a' on the
platform in question.

> considering that char could be signed and negative, you could
> have serious consequences.


That's the real problem. The OP had an array "int x[ 256 ] ;";
indexing it with a char could definitely be a problem (and
logically, it probably should be "int x[ UCHAR_MAX + 1 ] ;").
But of course, we (and g++) don't know whether he intends to
index it with a char, or with a char cast to unsigned char, or
with an int, return value from istream::get() or fgetc(). And
'a' *is* guaranteed to be positive, and in the range
0...UCHAR_MAX.

> Granted, this isn't a problem in practice, but its not
> portable that foo['a'] = 1 should do something specific.


Except that the language standard says that it does something
very specific, and very useful. Issuing a warning in this case
is simply brain
damage.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-18-2008
On Jun 17, 7:27 pm, Jerry Coffin <(E-Mail Removed)> wrote:
> In article <4857ed5a$0$12713$(E-Mail Removed)>,
> (E-Mail Removed) says...


> [ ... ]


> > Right, but using 'a' as an index into an array could be a
> > different index on different compilers. considering that
> > char could be signed and negative, you could have serious
> > consequences.


> > Granted, this isn't a problem in practice, but its not
> > portable that foo['a'] = 1 should do something specific.


> That depends on what you mean by something specific.
> Basically, the behavior is unspecified, but NOT undefined.


The behavior is exactly specified (or at least, as specified as
anything else in C++). You index the array with the value
corresponding to the encoding of a small a in the native
character encoding. If the goal is to index the entry
corresponding to the encoding of a small a, this is the only
correct and specified way of doing it.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      06-18-2008
On Jun 17, 12:00 pm, Mirco Wahab <(E-Mail Removed)-halle.de> wrote:
> Ivan wrote:
> > For example
> > int data[256];
> > data['a'] = 1;
> > data['b'] = 1;
> > ///////////////////////////////////
> > gcc is complaining about this syntax, so i am using static cast on the
> > character literal. Is there a better way to do this?


> Which gcc? From your example, I assumed:


> int data[256];


> int main()
> {
> data['a'] = 1;
> data['b'] = 1;
> return 0;
> }


> Compiled as C++ There was not a single warning in:
> g++-4.3 (-Wall -pedantic)


g++ 4.1.0 (under Solaris) definitely warns in this case when
-Wall -pedantic is used.

> mingw-gcc-3.4.1


So does 3.4.0 under Solaris, and the CygWin version of 3.4.4
under Windows.

> icpc (intel CC 10.1)


> Maybe you made another mistake not shown in your incomplete
> excerpt.


I have no problem reproducing his warnings, with several
different versions of g++, as long as -Wall is used. The actual
warning is "char-subscripts", so adding -Wno-char-subscripts
*after* -Wall (or not using -Wall at all, but choosing
explicitly for each warning) will suppress it. Which you
probably should do---this is one of those brain dead warnings of
which every compiler seems to have a few.

--
James Kanze (GABI Software) email:(E-Mail Removed)
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to overload subscript of 2D-array DaVinci C++ 5 05-10-2006 10:31 PM
array subscript type cannot be `char`? Pedro Graca C Programming 51 03-28-2006 10:40 PM
out of range array subscript Richard Delorme C Programming 5 05-15-2004 03:42 PM
Order of uknown array subscript Tom Page C++ 4 02-17-2004 02:55 PM
Using undef as an array subscript Yehuda Berlinger Perl Misc 8 07-01-2003 07:04 PM



Advertisments