Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > fgets - design deficiency: no efficient way of finding last character read

Reply
Thread Tools

fgets - design deficiency: no efficient way of finding last character read

 
 
John Reye
Guest
Posts: n/a
 
      04-11-2012
Hello,

The last character read from fgets(buf, sizeof(buf), inputstream) is:
'\n'
OR
any character x, when no '\n' was encountered in sizeof(buf)-1
consecutive chars, or when x is the last char of the inputstream

***How can one EFFICIENTLY determine if the last character is '\n'??
"Efficiently" means: don't use strlen!!!

I only come up with the strlen method, which - to me - says that fgets
has a bad design.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[])
{
char buf[6];
FILE *fp = stdin;
while (fgets(buf, sizeof(buf), fp)) {
printf((buf[strlen(buf)-1] == '\n') ? "Got a line which ends with
newline: %s" : "no newline: %s", buf);
}


return EXIT_SUCCESS;
}



A well-designed fgets function should return the length of characters
read, should it not??

Please surprise me, that there is a way of efficiently determining the
number of characters read.
I've thought of ftell, but I think that does not work with stdin.

Because right now, I think that fgets really seems useless.
Why is the standard C library so inefficient?
Do I really have to go about designing my own library?

Thanks for tipps and pointers

Regards,
J.
 
Reply With Quote
 
 
 
 
Rupert Swarbrick
Guest
Posts: n/a
 
      04-11-2012
John Reye <> writes:
> A well-designed fgets function should return the length of characters
> read, should it not??
>
> Please surprise me, that there is a way of efficiently determining the
> number of characters read.
> I've thought of ftell, but I think that does not work with stdin.


I'm intrigued. What application do you have where you read extremely
long lines from stdin using fgets? This seems an odd thing to do: I
can't think of any text-based formats where lines are extremely
long. For binary formats, use fread and (oh, look!):

FREAD(3)

....

RETURN VALUE
fread() and fwrite() return the number of items successfully read
or written (i.e., not the number of characters). If an error
occurs, or the end-of-file is reached, the return value is a
short item count (or zero).


It seems that the standard library isn't so badly designed after all...

<snip>
> Do I really have to go about designing my own library?


No.


Rupert

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iJwEAQECAAYFAk+FwqsACgkQRtd/pJbYVobF/gQAm45GjO09qc1i9yNjSujIl3Km
F8lHqO/YhPu2RQXM7txgrgnUs5HLcslaWdfnKEHzh0sOrWAs2k63gRB4g o5bFXCn
tCVMTM1mItsMrpwBRCWglkuUU8OREBuKli/kC45DgWiLOOlwFRbUF4vKOSi7Wm0x
OYkyZpwuxjUwLyqTJ8Q=
=c1DD
-----END PGP SIGNATURE-----
 
Reply With Quote
 
 
 
 
Ben Pfaff
Guest
Posts: n/a
 
      04-11-2012
Rupert Swarbrick <> writes:

> I'm intrigued. What application do you have where you read extremely
> long lines from stdin using fgets? This seems an odd thing to do: I
> can't think of any text-based formats where lines are extremely
> long.


It's fairly common for machine-generated HTML and XML (which are
text-based formats) to be single, very-long lines.
 
Reply With Quote
 
John Reye
Guest
Posts: n/a
 
      04-11-2012
On Apr 11, 7:43*pm, Rupert Swarbrick <rswarbr...@gmail.com> wrote:
> I'm intrigued. What application do you have where you read extremely
> long lines from stdin using fgets?

Actually I was using fgets, to read into a buffer. If the buffer is
not large enough to fit an entire line (i.e. one including '\n'), I
doubled the buffer and read the remaining chars. (stdin is just an
example that shows that I cannot abuse ftell to determine the length
read... you know: ftell-after-fgets minus ftell-before-fgets).

I thought fgets would be a good function to use, since it
automatically stops, when it encounters '\n'.

> * * * *fread() and fwrite() return the number of items successfully read
> * * * *or written (i.e., not the number of characters). *If an error
> * * * *occurs, or the end-of-file is reached, the return value isa
> * * * *short item count (or zero).


Yes... probably fread is a better way of handling it!
I want a buffer to hold the complete line, and then continue reading
lines.

***What is more efficient?
If I use fread, I'll probably overshoot beyond the '\n'.
Is it more efficient to rewind via fseek, and fread the overshoot to
the beginning of the buffer;
OR is it more efficient to copy the overshoot to the beginning of the
buffer and the fread the remainder.

Thanks.
J.
 
Reply With Quote
 
John Reye
Guest
Posts: n/a
 
      04-11-2012
On Apr 11, 7:53*pm, Ben Pfaff <b...@cs.stanford.edu> wrote:
> Rupert Swarbrick <rswarbr...@gmail.com> writes:
> It's fairly common for machine-generated HTML and XML (which are
> text-based formats) to be single, very-long lines.


Correct, but I would not read those huge lines, because the '\n' is
not the logical divider.

I however want a nice routine (which I have to code myself), which
uses realloc, to adjust a buffer to fit everything until the '\n'.
C standard lib does not have anything like this - so I have to code it
myself.

I bet C++ has something useful that one could use. It seems that many
went into C++, to make it the huge bloated monster that it is! But
still seems worth a look, to relieve me from having to handle this
stuff at the basic level. Alternative: I need to develop my own
library of useful C routines.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      04-11-2012
John Reye <> writes:
> The last character read from fgets(buf, sizeof(buf), inputstream) is:
> '\n'
> OR
> any character x, when no '\n' was encountered in sizeof(buf)-1
> consecutive chars, or when x is the last char of the inputstream
>
> ***How can one EFFICIENTLY determine if the last character is '\n'??
> "Efficiently" means: don't use strlen!!!
>
> I only come up with the strlen method, which - to me - says that fgets
> has a bad design.
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> int main(int argc, char *argv[])
> {
> char buf[6];
> FILE *fp = stdin;
> while (fgets(buf, sizeof(buf), fp)) {
> printf((buf[strlen(buf)-1] == '\n') ? "Got a line which ends with
> newline: %s" : "no newline: %s", buf);
> }
>
>
> return EXIT_SUCCESS;
> }
>
>
>
> A well-designed fgets function should return the length of characters
> read, should it not??
>
> Please surprise me, that there is a way of efficiently determining the
> number of characters read.
> I've thought of ftell, but I think that does not work with stdin.
>
> Because right now, I think that fgets really seems useless.
> Why is the standard C library so inefficient?
> Do I really have to go about designing my own library?


Have you measured the performance cost of calling strlen()?

I haven't done so myself, so the following is largely speculation,
but I strongly suspect that the time to call strlen() is going to
be *much* less than the time to read the data. Yes, an fgets-like
function could return additional information, either the length
of the string or a pointer to the end of it, and that would save a
little time, but I'm not convinced it would be a significant benefit.

And there would be some small but non-zero overhead in returning the
extra information. In a lot of cases, the caller isn't going to use
that information (perhaps it's going to traverse the string anyway).

*Measure* before you decide that fgets is "useless".

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Kaz Kylheku
Guest
Posts: n/a
 
      04-11-2012
On 2012-04-11, John Reye <> wrote:
> ***How can one EFFICIENTLY determine if the last character is '\n'??
> "Efficiently" means: don't use strlen!!!


There is no way to know where the last character of a string is if you
do not know the length explicitly, or else implicitly (scan the string
looking for the null terminator).

> I only come up with the strlen method, which - to me - says that fgets
> has a bad design.


The newline can be missing only in two situations. One is that the buffer isn't
large enough to hold the line. In that case, some non-newline character is
written into the next-to-last element of the buffer and a null terminator
into the last element. If you set the next-to-last byte to zero before
calling fets, you can detect that this situation has happened by finding
a non-zero byte there.

The second situation is that the last line of the stream has been read,
but fails to be newline terminated.

If you want to detect this situation, you only to check for if end-of-file has
been reached. That is to say, keep calling fgets until it returns NULL. Then
go back to the most recently retrieved line and check whether the newline is
here or not, with the help of strlen, or strchr(line, '\n'), etc.

So as you can see, you don't have to scan every single line.
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      04-11-2012
On 04/11/2012 12:43 PM, John Reye wrote:
> Hello,
>
> The last character read from fgets(buf, sizeof(buf), inputstream) is:
> '\n'
> OR
> any character x, when no '\n' was encountered in sizeof(buf)-1
> consecutive chars, or when x is the last char of the inputstream
>
> ***How can one EFFICIENTLY determine if the last character is '\n'??


That's relatively easy - so long as you don't need to know where the
'\n' is.

> "Efficiently" means: don't use strlen!!!
>
> I only come up with the strlen method, which - to me - says that fgets
> has a bad design.


The following approach uses strchr() rather than strlen(), so it
technically meets your specification. However, I presume you would have
the same objections to strchr() as you do to strlen(). I'd like to point
out, however, that it uses strchr() only once per file, which seems
efficient enough for me. If you're doing so little processing per file
that a single call to strchr() per file adds significantly to the total
processing load, I'd be more worried about the costs associated with
fopen() and fclose() than those associated with strchr().

The key point is that a successful call to fgets() can fail to read in
an '\n' character only if fgets() meets the end of the input file, or
the end of your buffer, both of which can be checked for quite
efficiently. If it reaches the end of your buffer, there's one and only
one place where the '\n' character can be, if one was read in.
Therefore, it's only at the end of the file that a search is required.

> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> int main(int argc, char *argv[])
> {
> char buf[6];
> FILE *fp = stdin;


buf[(sizeof buf)-1] = 1; // any non-zero value will do.

> while (fgets(buf, sizeof(buf), fp)) {

const char *prefix =
(buf[(sizeof buf)-1] == '\0' && buf[(sizeof buf)-2] != '\n'
|| feof(fp) && !strchr(buf, '\n')) ? "no " : "";

printf("Got a line which ends with %snewline: %s\n",
prefix, buf);

buf[(sizeof buf)-1] = 1;
> }
>
>
> return EXIT_SUCCESS;
> }
>
>
>
> A well-designed fgets function should return the length of characters
> read, should it not??
>
> Please surprise me, that there is a way of efficiently determining the
> number of characters read.
> I've thought of ftell, but I think that does not work with stdin.
>
> Because right now, I think that fgets really seems useless.
> Why is the standard C library so inefficient?


Measure the inefficiency before deciding whether or not it's useless.
You may be surprised.

> Do I really have to go about designing my own library?


You don't need an entire library; a function equivalent to fgets() that
calls getc() and provides the information you're looking for wouldn't be
too difficult to write, and should compile fairly efficiently.
 
Reply With Quote
 
Rupert Swarbrick
Guest
Posts: n/a
 
      04-11-2012
Kaz Kylheku <> writes:
> On 2012-04-11, John Reye <> wrote:
>> I only come up with the strlen method, which - to me - says that fgets
>> has a bad design.

>
> The newline can be missing only in two situations. One is that the buffer isn't
> large enough to hold the line. In that case, some non-newline character is
> written into the next-to-last element of the buffer and a null terminator
> into the last element. If you set the next-to-last byte to zero before
> calling fets, you can detect that this situation has happened by finding
> a non-zero byte there.
>
> The second situation is that the last line of the stream has been read,
> but fails to be newline terminated.
>
> If you want to detect this situation, you only to check for if end-of-file has
> been reached. That is to say, keep calling fgets until it returns NULL. Then
> go back to the most recently retrieved line and check whether the newline is
> here or not, with the help of strlen, or strchr(line, '\n'), etc.
>
> So as you can see, you don't have to scan every single line.


Thanks for this. It neatly uses the O(1) access at the end of the string
and gets around the OP's problem brillantly. I like it!

Rupert

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iJwEAQECAAYFAk+F6/8ACgkQRtd/pJbYVoZmwwP9GuuCob/JJoEwY8jlsnoj0Ziw
Fy5fB8HJw9xNNCuSq6C7O3KzNjZ/A5hCs5w9YN2V8E+K84bPLfGPVlRindcRpuf8
PV5Hec0q2GSPm48tJmVtvsNg2ohJjTXsKz7f/ZW71cXH87ZgF49PvzQrmrGj/+0R
bIYEZS9yFJ7Q90W7dOM=
=Z4QS
-----END PGP SIGNATURE-----
 
Reply With Quote
 
John Reye
Guest
Posts: n/a
 
      04-11-2012
On Apr 11, 9:13 pm, Kaz Kylheku <k...@kylheku.com> wrote:
thanks for your comment

On Apr 11, 10:36*pm, James Kuyper <jameskuy...@verizon.net> wrote:
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>

>
> > int main(int argc, char *argv[])
> > {
> > * char buf[6];
> > * FILE *fp = stdin;

>
> * * buf[(sizeof buf)-1] = 1; * *// any non-zero value will do.
>
> > * while (fgets(buf, sizeof(buf), fp)) {

>
> * * * * const char *prefix =
> * * * * * * (buf[(sizeof buf)-1] == '\0' && buf[(sizeof buf)-2] != '\n'
> * * * * * * || feof(fp) && !strchr(buf, '\n')) ? "no " : "";
>
> * * * * printf("Got a line which ends with %snewline: %s\n",
> * * * * * * prefix, buf);
>
> * * * * buf[(sizeof buf)-1] = 1;
>
> > * }

>
> > * return EXIT_SUCCESS;
> > }



Thanks for that! It's really good!


>
>
> > Do I really have to go about designing my own library?

>
> You don't need an entire library; a function equivalent to fgets() that
> calls getc() and provides the information you're looking for wouldn't be
> too difficult to write, and should compile fairly efficiently.


Hmmm... I think fread() is more efficient than continous getc().

Does this make sense?

For some context:
I think that when writing a getline function (that uses realloc)...
i.e. size_t getline(char **ptr_to_inner_buf, FILE *fp) ... where
ptr_to_inner_buf is set to an internal buffer that holds bytes until
'\n', or any char x if EOF...

then realizing that getline function, by repeatedly calling getc() is
less efficient THAN using fread to get a number of bytes, scan for
'\n' and place a '\0' in the following byte. Before the next call to
fread, I could scan any overshoot (beyond '\n'... putting back the
char overwritten by '\0' via a tmp) for '\n' and if I find it... again
set '\0' and adjust ptr_to_inner_buf (see function declaration).
Otherwise I copy the overshoot to the very beginning of the buffer,
and fread the delta needed to fill the entire buffer.
If no '\n in the buffer, I realloc and fread the delta. etc.etc.

So fread() more efficient than continous getc(). Or am I wrong?

Thanks.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
newbie question: fgets() and feof() read last line twice Andy C Programming 3 03-19-2009 05:16 PM
removing newline character from the buffer read by fgets junky_fellow@yahoo.co.in C Programming 16 11-28-2006 10:44 PM
help finding an efficient way to copy dynamic memory!! laclac01@yahoo.com C++ 16 08-27-2005 07:11 AM
[URGENT] fgets reading last line in file twice DJP C++ 7 10-21-2004 09:23 AM
fscanf or fgets still misses last line unless there is a newline Charles Erwin C Programming 4 10-06-2003 08:12 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57