Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Line input and implementation-defined behaviour (http://www.velocityreviews.com/forums/t315492-line-input-and-implementation-defined-behaviour.html)

Enrico `Trippo' Porreca 09-27-2003 04:09 PM

Line input and implementation-defined behaviour
 
Both K&R book and Steve Summit's tutorial define a getline() function
correctly testing the return value of getchar() against EOF.

I know that getchar() returns EOF or the character value cast to
unsigned char.

Since char may be signed (and if so, the return value of getchar() would
be outside its range), doesn't the commented line in the following code
produce implementation-defined behaviour?

char s[SIZE];
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
s[i] = c; /* ??? */
i++;
}

s[i] = '\0';

If this is indeed implementation defined, is there any solution?

--
Enrico `Trippo' Porreca


Simon Biber 09-27-2003 05:56 PM

Re: Line input and implementation-defined behaviour
 
"Enrico `Trippo' Porreca" <trippo@lombardiacom.it> wrote:
> Since char may be signed (and if so, the return value of getchar()
> would be outside its range), doesn't the commented line in the
> following code produce implementation-defined behaviour?


Almost. If a character is read whose code is out of the range of
signed char, it produces an implementation-defined result, or an
implementation-defined signal is raised. This is not quite as bad
as implementation-defined behaviour, but almost.

> char s[SIZE];
> int c;
> size_t i = 0;
>
> while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> s[i] = c; /* ??? */
> i++;
> }
>
> s[i] = '\0';
>
> If this is indeed implementation defined, is there any solution?


If char is signed, and the value of the character is outside the
range of signed char, then you have an out-of-range conversion to
a signed integer type, so: "either the result is implementation-defined
or an implementation-defined signal is raised." (C99 6.3.1.3#3)

However, because this is such an incredibly common operation in
existing C code, an implementor would be absolutely idiotic to
define this to have any undesired effects.

--
Simon.



Enrico `Trippo' Porreca 09-27-2003 06:28 PM

Re: Line input and implementation-defined behaviour
 
Simon Biber wrote:
>>char s[SIZE];
>>int c;
>>size_t i = 0;
>>
>>while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
>> s[i] = c; /* ??? */
>> i++;
>>}
>>
>>s[i] = '\0';
>>
>>If this is indeed implementation defined, is there any solution?

>
> If char is signed, and the value of the character is outside the
> range of signed char, then you have an out-of-range conversion to
> a signed integer type, so: "either the result is implementation-defined
> or an implementation-defined signal is raised." (C99 6.3.1.3#3)
>
> However, because this is such an incredibly common operation in
> existing C code, an implementor would be absolutely idiotic to
> define this to have any undesired effects.


I agree, but AFAIK the implementor is allowed to be idiot...
Am I right?

Is the following a plausible solution (i.e. without any trap
representation or type conversion or something-defined behaviour problem)?

char s[SIZE];
unsigned char *t = (unsigned char *) s;
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
t[i] = c; /* ??? */
i++;
}

s[i] = '\0';

--
Enrico `Trippo' Porreca


Simon Biber 09-27-2003 07:10 PM

Re: Line input and implementation-defined behaviour
 
"Enrico `Trippo' Porreca" <trippo@lombardiacom.it> wrote:
> I agree, but AFAIK the implementor is allowed to be idiot...
> Am I right?


Yes, but trust me, anyone who fouled up the char<->int conversion
would break a large proportion of existing code that is considered
to be completely portable. Therefore their implementation would
not sell.

Consider the <ctype.h> functions, which require that the input is
an int whose value is within the range of unsigned char. That is
why we suggest that people cast to unsigned char like this:
char *p, s[] = "hello";
for(p = s; *p; p++)
*p = toupper((unsigned char)*p);
Now if the value of *p was negative, now when converted to unsigned
char it is positive and outside the range of signed char. So this
could theoretically be outside the range of int, if int and signed
char have the same range. Therefore you have the same situation in
reverse - unsigned char to int conversion is not guaranteed to be
within range.

> Is the following a plausible solution (i.e. without any trap
> representation or type conversion or something-defined behaviour
> problem)?
>
> char s[SIZE];
> unsigned char *t = (unsigned char *) s;
> int c;
> size_t i = 0;
>
> while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> t[i] = c; /* ??? */


The assignment itself is safe, but since it places an arbitrary
representation into the elements of the array s, which are char
objects and possibly signed, it might generate a trap
representation. That is if signed char can have trap
representations. I'm not completely sure.

> i++;
> }
>
> s[i] = '\0';


--
Simon.



Malcolm 09-27-2003 10:29 PM

Re: Line input and implementation-defined behaviour
 

"Simon Biber" <news@ralminNOSPAM.cc> wrote in message
>
> > char s[SIZE];
> > unsigned char *t = (unsigned char *) s;
> > int c;
> > size_t i = 0;
> >
> > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> > t[i] = c; /* ??? */


s[i] = 0;
>
> The assignment itself is safe, but since it places an arbitrary
> representation into the elements of the array s, which are char
> objects and possibly signed, it might generate a trap
> representation. That is if signed char can have trap
> representations. I'm not completely sure.
>

signed chars can trap. unsigned chars are guaranteed to be able to hold
arbitrary data so cannot.
You would have to be desperately unlucky for the implementation to allow
non-chars to be read in from stdin, and then for the function to trap. The
most likely place for the trap to trigger would be the assignment s[i] = 0,
since the compiler probably won't realise that pointer t actually points to
a buffer declared as straight char.



Peter Nilsson 09-28-2003 02:20 AM

Re: Line input and implementation-defined behaviour
 
"Malcolm" <malcolm@55bank.freeserve.co.uk> wrote in message
news:bl52k9$ure$1@news6.svr.pol.co.uk...
>
> "Simon Biber" <news@ralminNOSPAM.cc> wrote in message
> >
> > > char s[SIZE];
> > > unsigned char *t = (unsigned char *) s;
> > > int c;
> > > size_t i = 0;
> > >
> > > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> > > t[i] = c; /* ??? */

>
> s[i] = 0;
> >
> > The assignment itself is safe, but since it places an arbitrary
> > representation into the elements of the array s, which are char
> > objects and possibly signed, it might generate a trap
> > representation. That is if signed char can have trap
> > representations. I'm not completely sure.
> >

> signed chars can trap. unsigned chars are guaranteed to be able to hold
> arbitrary data so cannot.
> You would have to be desperately unlucky for the implementation to allow
> non-chars to be read in from stdin, and then for the function to trap. The
> most likely place for the trap to trigger would be the assignment s[i] =

0,

0 is a value in the range of signed char, so it is not possible for a
conforming compiler to replace the contents of object s[i] with a trap
representation.

[You can always initialise an unitialised automatic variable for instance,
even if it's uninitialised state is a trap representation.]

> since the compiler probably won't realise that pointer t actually points

to
> a buffer declared as straight char.


You seem to be confusing 'trap representations' for 'trap'. The latter term
commonly being used for raised exceptions on many architectures. A trap
representation, in and of itself, need not raise an exception.

Indeed, whilst the standards allow signed char to have trap representations,
sections like 6.2.6.1p5 effectively say that all reads via character lvalues
are privileged. So at worst, it would seem, reading a character trap
representation will only yield an unspecified value. [Non-trapping trap
representations!]

--
Peter



Malcolm 09-28-2003 09:20 AM

Re: Line input and implementation-defined behaviour
 

"Peter Nilsson" <airia@acay.com.au> wrote in message
>
> > The most likely place for the trap to trigger would be the assignment
> > s[i] = 0,

>
> 0 is a value in the range of signed char, so it is not possible for a
> conforming compiler to replace the contents of object s[i] with a trap
> representation.
>

What I meant was that the assignment may trigger the trap, if illegal
characters are stored into the array s. This is because values from s may be
loaded into registers as chars.
>
> Indeed, whilst the standards allow signed char to have trap
> representations, sections like 6.2.6.1p5 effectively say that all reads

via
> character lvalues are privileged. So at worst, it would seem, reading a
> character trap representation will only yield an unspecified value. [Non-
> trapping trap representations!]
>

It seems it would be unacceptable for the line

fgets(line, sizeof line, fp);

to cause a program abort if fed an illegal character, with nothing the
programmer can do to stop it. OTOH reads are the most likely way for corrupt
data to get into the data, and the whole point of trap representations is to
close down any program that is malfunctioning.




Enrico `Trippo' Porreca 09-28-2003 03:01 PM

Re: Line input and implementation-defined behaviour
 
Simon Biber wrote:
>> I agree, but AFAIK the implementor is allowed to be idiot...
>> Am I right?

>
> Yes, but trust me, anyone who fouled up the char<->int conversion
> would break a large proportion of existing code that is considered
> to be completely portable. Therefore their implementation would
> not sell.


Uhm... So I think I should use K&R's getline(), without being too
paranoid about it...

Thanks.

--
Enrico `Trippo' Porreca


CBFalconer 09-28-2003 05:27 PM

Re: Line input and implementation-defined behaviour
 
Enrico `Trippo' Porreca wrote:
> Simon Biber wrote:
>
> >> I agree, but AFAIK the implementor is allowed to be idiot...
> >> Am I right?

> >
> > Yes, but trust me, anyone who fouled up the char<->int conversion
> > would break a large proportion of existing code that is considered
> > to be completely portable. Therefore their implementation would
> > not sell.

>
> Uhm... So I think I should use K&R's getline(), without being too
> paranoid about it...


Consider ggets, available at:

<http://cbfalconer.home.att.net/download/>

which has the convenience of gets without the insecurities.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!



Dan Pop 09-29-2003 02:19 PM

Re: Line input and implementation-defined behaviour
 
In <3f75cf48$0$4189$afc38c87@news.optusnet.com.au> "Simon Biber" <news@ralminNOSPAM.cc> writes:

>"Enrico `Trippo' Porreca" <trippo@lombardiacom.it> wrote:
>> Since char may be signed (and if so, the return value of getchar()
>> would be outside its range), doesn't the commented line in the
>> following code produce implementation-defined behaviour?

>
>Almost. If a character is read whose code is out of the range of
>signed char, it produces an implementation-defined result, or an
>implementation-defined signal is raised. This is not quite as bad
>as implementation-defined behaviour, but almost.


No implementation-defined signal is raised in C89 and I strongly doubt
that any *real* C99 implementation would do that, breaking existing C89
code.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de


All times are GMT. The time now is 05:01 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.