Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Line input and implementation-defined behaviour

Reply
Thread Tools

Line input and implementation-defined behaviour

 
 
Enrico `Trippo' Porreca
Guest
Posts: n/a
 
      09-27-2003
Both K&R book and Steve Summit's tutorial define a getline() function
correctly testing the return value of getchar() against EOF.

I know that getchar() returns EOF or the character value cast to
unsigned char.

Since char may be signed (and if so, the return value of getchar() would
be outside its range), doesn't the commented line in the following code
produce implementation-defined behaviour?

char s[SIZE];
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
s[i] = c; /* ??? */
i++;
}

s[i] = '\0';

If this is indeed implementation defined, is there any solution?

--
Enrico `Trippo' Porreca

 
Reply With Quote
 
 
 
 
Simon Biber
Guest
Posts: n/a
 
      09-27-2003
"Enrico `Trippo' Porreca" <(E-Mail Removed)> wrote:
> Since char may be signed (and if so, the return value of getchar()
> would be outside its range), doesn't the commented line in the
> following code produce implementation-defined behaviour?


Almost. If a character is read whose code is out of the range of
signed char, it produces an implementation-defined result, or an
implementation-defined signal is raised. This is not quite as bad
as implementation-defined behaviour, but almost.

> char s[SIZE];
> int c;
> size_t i = 0;
>
> while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> s[i] = c; /* ??? */
> i++;
> }
>
> s[i] = '\0';
>
> If this is indeed implementation defined, is there any solution?


If char is signed, and the value of the character is outside the
range of signed char, then you have an out-of-range conversion to
a signed integer type, so: "either the result is implementation-defined
or an implementation-defined signal is raised." (C99 6.3.1.3#3)

However, because this is such an incredibly common operation in
existing C code, an implementor would be absolutely idiotic to
define this to have any undesired effects.

--
Simon.


 
Reply With Quote
 
 
 
 
Enrico `Trippo' Porreca
Guest
Posts: n/a
 
      09-27-2003
Simon Biber wrote:
>>char s[SIZE];
>>int c;
>>size_t i = 0;
>>
>>while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
>> s[i] = c; /* ??? */
>> i++;
>>}
>>
>>s[i] = '\0';
>>
>>If this is indeed implementation defined, is there any solution?

>
> If char is signed, and the value of the character is outside the
> range of signed char, then you have an out-of-range conversion to
> a signed integer type, so: "either the result is implementation-defined
> or an implementation-defined signal is raised." (C99 6.3.1.3#3)
>
> However, because this is such an incredibly common operation in
> existing C code, an implementor would be absolutely idiotic to
> define this to have any undesired effects.


I agree, but AFAIK the implementor is allowed to be idiot...
Am I right?

Is the following a plausible solution (i.e. without any trap
representation or type conversion or something-defined behaviour problem)?

char s[SIZE];
unsigned char *t = (unsigned char *) s;
int c;
size_t i = 0;

while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
t[i] = c; /* ??? */
i++;
}

s[i] = '\0';

--
Enrico `Trippo' Porreca

 
Reply With Quote
 
Simon Biber
Guest
Posts: n/a
 
      09-27-2003
"Enrico `Trippo' Porreca" <(E-Mail Removed)> wrote:
> I agree, but AFAIK the implementor is allowed to be idiot...
> Am I right?


Yes, but trust me, anyone who fouled up the char<->int conversion
would break a large proportion of existing code that is considered
to be completely portable. Therefore their implementation would
not sell.

Consider the <ctype.h> functions, which require that the input is
an int whose value is within the range of unsigned char. That is
why we suggest that people cast to unsigned char like this:
char *p, s[] = "hello";
for(p = s; *p; p++)
*p = toupper((unsigned char)*p);
Now if the value of *p was negative, now when converted to unsigned
char it is positive and outside the range of signed char. So this
could theoretically be outside the range of int, if int and signed
char have the same range. Therefore you have the same situation in
reverse - unsigned char to int conversion is not guaranteed to be
within range.

> Is the following a plausible solution (i.e. without any trap
> representation or type conversion or something-defined behaviour
> problem)?
>
> char s[SIZE];
> unsigned char *t = (unsigned char *) s;
> int c;
> size_t i = 0;
>
> while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> t[i] = c; /* ??? */


The assignment itself is safe, but since it places an arbitrary
representation into the elements of the array s, which are char
objects and possibly signed, it might generate a trap
representation. That is if signed char can have trap
representations. I'm not completely sure.

> i++;
> }
>
> s[i] = '\0';


--
Simon.


 
Reply With Quote
 
Malcolm
Guest
Posts: n/a
 
      09-27-2003

"Simon Biber" <(E-Mail Removed)> wrote in message
>
> > char s[SIZE];
> > unsigned char *t = (unsigned char *) s;
> > int c;
> > size_t i = 0;
> >
> > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> > t[i] = c; /* ??? */


s[i] = 0;
>
> The assignment itself is safe, but since it places an arbitrary
> representation into the elements of the array s, which are char
> objects and possibly signed, it might generate a trap
> representation. That is if signed char can have trap
> representations. I'm not completely sure.
>

signed chars can trap. unsigned chars are guaranteed to be able to hold
arbitrary data so cannot.
You would have to be desperately unlucky for the implementation to allow
non-chars to be read in from stdin, and then for the function to trap. The
most likely place for the trap to trigger would be the assignment s[i] = 0,
since the compiler probably won't realise that pointer t actually points to
a buffer declared as straight char.


 
Reply With Quote
 
Peter Nilsson
Guest
Posts: n/a
 
      09-28-2003
"Malcolm" <(E-Mail Removed)> wrote in message
news:bl52k9$ure$(E-Mail Removed)...
>
> "Simon Biber" <(E-Mail Removed)> wrote in message
> >
> > > char s[SIZE];
> > > unsigned char *t = (unsigned char *) s;
> > > int c;
> > > size_t i = 0;
> > >
> > > while ((c = getchar()) != EOF && c != '\n' && i < SIZE - 1) {
> > > t[i] = c; /* ??? */

>
> s[i] = 0;
> >
> > The assignment itself is safe, but since it places an arbitrary
> > representation into the elements of the array s, which are char
> > objects and possibly signed, it might generate a trap
> > representation. That is if signed char can have trap
> > representations. I'm not completely sure.
> >

> signed chars can trap. unsigned chars are guaranteed to be able to hold
> arbitrary data so cannot.
> You would have to be desperately unlucky for the implementation to allow
> non-chars to be read in from stdin, and then for the function to trap. The
> most likely place for the trap to trigger would be the assignment s[i] =

0,

0 is a value in the range of signed char, so it is not possible for a
conforming compiler to replace the contents of object s[i] with a trap
representation.

[You can always initialise an unitialised automatic variable for instance,
even if it's uninitialised state is a trap representation.]

> since the compiler probably won't realise that pointer t actually points

to
> a buffer declared as straight char.


You seem to be confusing 'trap representations' for 'trap'. The latter term
commonly being used for raised exceptions on many architectures. A trap
representation, in and of itself, need not raise an exception.

Indeed, whilst the standards allow signed char to have trap representations,
sections like 6.2.6.1p5 effectively say that all reads via character lvalues
are privileged. So at worst, it would seem, reading a character trap
representation will only yield an unspecified value. [Non-trapping trap
representations!]

--
Peter


 
Reply With Quote
 
Malcolm
Guest
Posts: n/a
 
      09-28-2003

"Peter Nilsson" <(E-Mail Removed)> wrote in message
>
> > The most likely place for the trap to trigger would be the assignment
> > s[i] = 0,

>
> 0 is a value in the range of signed char, so it is not possible for a
> conforming compiler to replace the contents of object s[i] with a trap
> representation.
>

What I meant was that the assignment may trigger the trap, if illegal
characters are stored into the array s. This is because values from s may be
loaded into registers as chars.
>
> Indeed, whilst the standards allow signed char to have trap
> representations, sections like 6.2.6.1p5 effectively say that all reads

via
> character lvalues are privileged. So at worst, it would seem, reading a
> character trap representation will only yield an unspecified value. [Non-
> trapping trap representations!]
>

It seems it would be unacceptable for the line

fgets(line, sizeof line, fp);

to cause a program abort if fed an illegal character, with nothing the
programmer can do to stop it. OTOH reads are the most likely way for corrupt
data to get into the data, and the whole point of trap representations is to
close down any program that is malfunctioning.



 
Reply With Quote
 
Enrico `Trippo' Porreca
Guest
Posts: n/a
 
      09-28-2003
Simon Biber wrote:
>> I agree, but AFAIK the implementor is allowed to be idiot...
>> Am I right?

>
> Yes, but trust me, anyone who fouled up the char<->int conversion
> would break a large proportion of existing code that is considered
> to be completely portable. Therefore their implementation would
> not sell.


Uhm... So I think I should use K&R's getline(), without being too
paranoid about it...

Thanks.

--
Enrico `Trippo' Porreca

 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      09-28-2003
Enrico `Trippo' Porreca wrote:
> Simon Biber wrote:
>
> >> I agree, but AFAIK the implementor is allowed to be idiot...
> >> Am I right?

> >
> > Yes, but trust me, anyone who fouled up the char<->int conversion
> > would break a large proportion of existing code that is considered
> > to be completely portable. Therefore their implementation would
> > not sell.

>
> Uhm... So I think I should use K&R's getline(), without being too
> paranoid about it...


Consider ggets, available at:

<http://cbfalconer.home.att.net/download/>

which has the convenience of gets without the insecurities.

--
Chuck F ((E-Mail Removed)) ((E-Mail Removed))
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!


 
Reply With Quote
 
Dan Pop
Guest
Posts: n/a
 
      09-29-2003
In <3f75cf48$0$4189$(E-Mail Removed)> "Simon Biber" <(E-Mail Removed)> writes:

>"Enrico `Trippo' Porreca" <(E-Mail Removed)> wrote:
>> Since char may be signed (and if so, the return value of getchar()
>> would be outside its range), doesn't the commented line in the
>> following code produce implementation-defined behaviour?

>
>Almost. If a character is read whose code is out of the range of
>signed char, it produces an implementation-defined result, or an
>implementation-defined signal is raised. This is not quite as bad
>as implementation-defined behaviour, but almost.


No implementation-defined signal is raised in C89 and I strongly doubt
that any *real* C99 implementation would do that, breaking existing C89
code.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Line in connector input not working? disabled? input radio to computer. 8ball meme Computer Support 7 11-18-2010 10:39 PM
Read a file line by line and write each line to a file based on the5th byte scad C++ 23 05-17-2009 06:11 PM
debugger behaviour different to execution behaviour Andy Chambers Java 1 05-14-2007 09:51 AM
How to read a text file line by line and remove some line kaushikshome C++ 4 09-10-2006 10:12 PM
Read a file line by line with a maximum number of characters per line Hugo Java 10 10-18-2004 11:42 AM



Advertisments