Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Can I assume fgets won't modify last bytes of output array if unused ? (http://www.velocityreviews.com/forums/t740308-can-i-assume-fgets-wont-modify-last-bytes-of-output-array-if-unused.html)

Francis Moreau 12-17-2010 10:26 AM

Can I assume fgets won't modify last bytes of output array if unused ?
 
Hello,

I think this is undefined behaviour, but I prefer asking just in case
I'm missing something.

Consider that a line is 8 characters long (including newline) and I pass
to fgets a buffer which can store at least 32 characters.

Can I assume that if fgets reads that line, then it won't modify any
characters in the buffer whose offset is greater than 8 ?

Thanks
--
Francis

Mark Wooding 12-17-2010 11:36 AM

Re: Can I assume fgets won't modify last bytes of output array if unused ?
 
Francis Moreau <francis.moro@gmail.com> writes:

> Consider that a line is 8 characters long (including newline) and I pass
> to fgets a buffer which can store at least 32 characters.
>
> Can I assume that if fgets reads that line, then it won't modify any
> characters in the buffer whose offset is greater than 8 ?


I think that you can, according to the standard: 7.19.7.2p2:

The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream
into the array pointed to by s. No additional characters are
read after a new-line character (which is retained) or after
end-of-file. A null character is written immediately after the
last character read into the array.

It only `reads ... characters ... into the array', and finally writes a
null character after the last one. It doesn't say it does anything else
to the array. An implementation that randomly trashes other parts of
the array would therefore be nonconforming.

-- [mdw]

Francis Moreau 12-17-2010 12:06 PM

Re: Can I assume fgets won't modify last bytes of output array if unused ?
 
mdw@distorted.org.uk (Mark Wooding) writes:

> Francis Moreau <francis.moro@gmail.com> writes:
>
>> Consider that a line is 8 characters long (including newline) and I pass
>> to fgets a buffer which can store at least 32 characters.
>>
>> Can I assume that if fgets reads that line, then it won't modify any
>> characters in the buffer whose offset is greater than 8 ?

>
> I think that you can, according to the standard: 7.19.7.2p2:
>
> The fgets function reads at most one less than the number of
> characters specified by n from the stream pointed to by stream
> into the array pointed to by s. No additional characters are
> read after a new-line character (which is retained) or after
> end-of-file. A null character is written immediately after the
> last character read into the array.
>
> It only `reads ... characters ... into the array', and finally writes a
> null character after the last one. It doesn't say it does anything else
> to the array. An implementation that randomly trashes other parts of
> the array would therefore be nonconforming.


Well that's not really clear to me, it must indeed read at most n-1
characters and write them to the array with a null character. But it
doesn't say it must not trash following characters in the array even if
that sounds stupid...

--
Francis

Mark Wooding 12-17-2010 01:45 PM

Re: Can I assume fgets won't modify last bytes of output array if unused ?
 
Francis Moreau <francis.moro@gmail.com> writes:

> Well that's not really clear to me, it must indeed read at most n-1
> characters and write them to the array with a null character. But it
> doesn't say it must not trash following characters in the array even
> if that sounds stupid...


It also doesn't say that many other unhelpful and counterintuitive
things don't occur. If your implementation's `fgets' does something
other than what's described in the standard, then it's not conforming.
And that includes making `beep-beep' noises, or clobbering extra stuff
in the input array.

-- [mdw]

Francis Moreau 12-17-2010 02:13 PM

Re: Can I assume fgets won't modify last bytes of output array if unused ?
 
mdw@distorted.org.uk (Mark Wooding) writes:

> Francis Moreau <francis.moro@gmail.com> writes:
>
>> Well that's not really clear to me, it must indeed read at most n-1
>> characters and write them to the array with a null character. But it
>> doesn't say it must not trash following characters in the array even
>> if that sounds stupid...

>
> It also doesn't say that many other unhelpful and counterintuitive
> things don't occur.


That's the reason why I think it's undefined.

> If your implementation's `fgets' does something other than what's
> described in the standard, then it's not conforming.


Well, I would say this differently: if my program relies on this
undefined behaviour then it's not conforming.

--
Francis

Eric Sosman 12-17-2010 02:29 PM

Re: Can I assume fgets won't modify last bytes of output array ifunused ?
 
On 12/17/2010 5:26 AM, Francis Moreau wrote:
> Hello,
>
> I think this is undefined behaviour, but I prefer asking just in case
> I'm missing something.
>
> Consider that a line is 8 characters long (including newline) and I pass
> to fgets a buffer which can store at least 32 characters.
>
> Can I assume that if fgets reads that line, then it won't modify any
> characters in the buffer whose offset is greater than 8 ?


As I understand it, fgets() is allowed to scribble on any or all
of the buffer's bytes, except that if there's an immediate end-of-file
it will touch none of them.

The wider question, I think, is "Why do you care?" Is this part
of a stratagem for dealing with lines that might contain '\0' or
some such?

--
Eric Sosman
esosman@ieee-dot-org.invalid

Eric Sosman 12-17-2010 03:26 PM

Re: Can I assume fgets won't modify last bytes of output array ifunused ?
 
On 12/17/2010 10:11 AM, Joe Wright wrote:
> On 12/17/2010 09:29, Eric Sosman wrote:
>> [...]
>> The wider question, I think, is "Why do you care?" Is this part
>> of a stratagem for dealing with lines that might contain '\0' or
>> some such?
>>

> If there is a '\0' in the stream, fgets will put it in the buffer
> without complaint.


In light of 7.19.2p2 I don't think even that much is guaranteed.

> without complaint. In my view this is a data error, not a problem with
> fgets. There is no rational case for '\0' in a text stream.


... which doesn't stop people from trying to handle screwball
formats with text streams, even though there's no certainty that the
attempts will succeed. I was just trying to imagine why the O.P.
was concerned about the tail end of an fgets() buffer, and wondering
whether he was attempting to deal with dodgy input.

--
Eric Sosman
esosman@ieee-dot-org.invalid

Francis Moreau 12-17-2010 05:08 PM

Re: Can I assume fgets won't modify last bytes of output array if unused ?
 
Eric Sosman <esosman@ieee-dot-org.invalid> writes:

> On 12/17/2010 5:26 AM, Francis Moreau wrote:
>> Hello,
>>
>> I think this is undefined behaviour, but I prefer asking just in case

>
>> I'm missing something.
>>
>> Consider that a line is 8 characters long (including newline) and I pass
>> to fgets a buffer which can store at least 32 characters.
>>
>> Can I assume that if fgets reads that line, then it won't modify any
>> characters in the buffer whose offset is greater than 8 ?

>
> As I understand it, fgets() is allowed to scribble on any or all
> of the buffer's bytes, except that if there's an immediate end-of-file
> it will touch none of them.
>
> The wider question, I think, is "Why do you care?" Is this part
> of a stratagem for dealing with lines that might contain '\0' or
> some such?


I'm wondering what is the most efficient way to see if fgets() read an
entire line.

For example consider this:

char buf[16], *p;
FILE *fp;

/* initialise fp */

p = fgets(buf, sizeof(buf), fp);

from here you can do this:

/* check if the line is entirely read */
len = strlen(buf);
if (len == 15 && buf[14] != '\n') {
/* the line has not been read completely */

}

but you could also do:

buf[14] = '\n';
p = fgets(buf, sizeof(buf), fp);
if (buf[14] != '\n') {
/* the line has not been read completely */

}

which seems more efficient.

Hence my question...
--
Francis

Eric Sosman 12-17-2010 06:28 PM

Re: Can I assume fgets won't modify last bytes of output array ifunused ?
 
On 12/17/2010 12:08 PM, Francis Moreau wrote:
> Eric Sosman<esosman@ieee-dot-org.invalid> writes:
>
>> On 12/17/2010 5:26 AM, Francis Moreau wrote:
>>> Hello,
>>>
>>> I think this is undefined behaviour, but I prefer asking just in case

>>
>>> I'm missing something.
>>>
>>> Consider that a line is 8 characters long (including newline) and I pass
>>> to fgets a buffer which can store at least 32 characters.
>>>
>>> Can I assume that if fgets reads that line, then it won't modify any
>>> characters in the buffer whose offset is greater than 8 ?

>>
>> As I understand it, fgets() is allowed to scribble on any or all
>> of the buffer's bytes, except that if there's an immediate end-of-file
>> it will touch none of them.
>>
>> The wider question, I think, is "Why do you care?" Is this part
>> of a stratagem for dealing with lines that might contain '\0' or
>> some such?

>
> I'm wondering what is the most efficient way to see if fgets() read an
> entire line.
>
> For example consider this:
>
> char buf[16], *p;
> FILE *fp;
>
> /* initialise fp */
>
> p = fgets(buf, sizeof(buf), fp);
>
> from here you can do this:
>
> /* check if the line is entirely read */
> len = strlen(buf);
> if (len == 15&& buf[14] != '\n') {
> /* the line has not been read completely */
>
> }


Possibly simpler, certainly briefer:

if (strchr(buf, '\n') == NULL) {
// incomplete line (or missing '\n' at EOF)
}

> but you could also do:
>
> buf[14] = '\n';
> p = fgets(buf, sizeof(buf), fp);
> if (buf[14] != '\n') {
> /* the line has not been read completely */


... or consisted of thirteen characters plus '\n', and fgets()
dutifully set buf[14] = '\0'.

> }
>
> which seems more efficient.


Two thoughts: First, I/O is many orders of magnitude slower
than the CPU, so saving a few milliquavers probably just gets you
back to the idle loop sooner. Second, if it's really important to
get the answer ASAP you may be better off using getc() in a loop
and testing directly, rather than calling fgets() and then using
CSI forensics to post-analyze what it did.

It Would Be Nice If fgets() returned something more than a
single bit's worth of information, like the number of bytes read,
say, or a pointer to the '\0'. Unfortunately, the less-helpful
interface was already well-established before standardization got
underway, and (like some other features of the library) we're all
stuck with it.

Accept my best wishes for the holiday season, to you and yours
and all your vermiform appendices.

--
Eric Sosman
esosman@ieee-dot-org.invalid

Francis Moreau 12-17-2010 07:33 PM

Re: Can I assume fgets won't modify last bytes of output array if unused ?
 
Eric Sosman <esosman@ieee-dot-org.invalid> writes:

> On 12/17/2010 12:08 PM, Francis Moreau wrote:
>> Eric Sosman<esosman@ieee-dot-org.invalid> writes:
>>
>>> On 12/17/2010 5:26 AM, Francis Moreau wrote:

>
>>>> Hello,
>>>>
>>>> I think this is undefined behaviour, but I prefer asking just in case
>>>
>>>> I'm missing something.
>>>>
>>>> Consider that a line is 8 characters long (including newline) and I pass
>>>> to fgets a buffer which can store at least 32 characters.
>>>>
>>>> Can I assume that if fgets reads that line, then it won't modify any
>>>> characters in the buffer whose offset is greater than 8 ?
>>>
>>> As I understand it, fgets() is allowed to scribble on any or all
>>> of the buffer's bytes, except that if there's an immediate end-of-file
>>> it will touch none of them.
>>>
>>> The wider question, I think, is "Why do you care?" Is this part
>>> of a stratagem for dealing with lines that might contain '\0' or
>>> some such?

>>
>> I'm wondering what is the most efficient way to see if fgets() read an
>> entire line.
>>
>> For example consider this:
>>
>> char buf[16], *p;
>> FILE *fp;
>>
>> /* initialise fp */
>>
>> p = fgets(buf, sizeof(buf), fp);
>>
>> from here you can do this:
>>
>> /* check if the line is entirely read */
>> len = strlen(buf);
>> if (len == 15&& buf[14] != '\n') {
>> /* the line has not been read completely */
>>
>> }

>
> Possibly simpler, certainly briefer:
>
> if (strchr(buf, '\n') == NULL) {
> // incomplete line (or missing '\n' at EOF)
> }


Yes.

> }
>
>> but you could also do:
>>
>> buf[14] = '\n';
>> p = fgets(buf, sizeof(buf), fp);
>> if (buf[14] != '\n') {
>> /* the line has not been read completely */

>
> ... or consisted of thirteen characters plus '\n', and fgets()
> dutifully set buf[14] = '\0'.
>


You're right, the test should had been

if (buf[14] && buf[14] != '\n') {
...

>
>> }
>>
>> which seems more efficient.

>
> Two thoughts: First, I/O is many orders of magnitude slower
> than the CPU, so saving a few milliquavers probably just gets you
> back to the idle loop sooner. Second, if it's really important to
> get the answer ASAP you may be better off using getc() in a loop
> and testing directly, rather than calling fgets() and then using
> CSI forensics to post-analyze what it did.


You're still right that it doesn't make any differences but I was trying
to have good taste when writting this nothing more.

And there're probably ton of developpers that encouter this, so there's
probably a well known pattern to check this.

> It Would Be Nice If fgets() returned something more than a
> single bit's worth of information, like the number of bytes read,
> say, or a pointer to the '\0'. Unfortunately, the less-helpful
> interface was already well-established before standardization got
> underway, and (like some other features of the library) we're all
> stuck with it.
>
> Accept my best wishes for the holiday season, to you and yours
> and all your vermiform appendices.


Thanks Mister Sosman, I'll take care of my vermiform appendix, don't
worry ;)

Happy Christmas.
--
Francis


All times are GMT. The time now is 06:03 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.