Velocity Reviews > Designing fgetline - a perspective

# Designing fgetline - a perspective

Carl
Guest
Posts: n/a

 11-10-2007
On 9 Nov 2007 at 23:30, Eric Sosman wrote:
> Still, I can't shake the feeling that you're trying
> to get one function to do too many tasks. Instead of
> using a bazillion flags to govern different modes of
> operation, might you consider a suite of related functions
> to handle the important variations? In effect, you'd be
> moving a few flags out of the struct and into the function
> name.

I think it's better to have one function. Consider that typically an ALU
will be a single circuit and the operation it performs (e.g. add,
multiply, etc.) is determined by control flags. So keeping down the
number of functions is following in good footsteps.

Eric Sosman
Guest
Posts: n/a

 11-10-2007
Carl wrote:
> On 9 Nov 2007 at 23:30, Eric Sosman wrote:
>> Still, I can't shake the feeling that you're trying
>> to get one function to do too many tasks. Instead of
>> using a bazillion flags to govern different modes of
>> operation, might you consider a suite of related functions
>> to handle the important variations? In effect, you'd be
>> moving a few flags out of the struct and into the function
>> name.

>
> I think it's better to have one function. Consider that typically an ALU
> will be a single circuit and the operation it performs (e.g. add,
> multiply, etc.) is determined by control flags. So keeping down the
> number of functions is following in good footsteps.

Something about the argument strikes me as unconvincing.
Do you really advocate eliminating sqrt() and cos() and
exp() and log() and all the rest, and replacing them with

double math(struct mathctl*, ...);

Or take my earlier tongue-in-cheek suggestion about
combining all the *printf() functions into one, using a
control block to manage the variations. If you think the
"Swiss Army knife" approach is good, let's see (1) your
design for the combined interface, and (2) helloworld.c
written as your interface would require.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid

Malcolm McLean
Guest
Posts: n/a

 11-10-2007

"Eric Sosman" <(E-Mail Removed)> wrote in message
> Isn't there some way you can find an excuse to add
> a couple more arguments? Six is too many for most people
> to keep straight, but you may as well try to confuse the
> geniuses, too.
>

The rule of four. Four arguments is as many as can be scanned. Seven as many

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Richard Harter
Guest
Posts: n/a

 11-11-2007
On Fri, 09 Nov 2007 18:30:39 -0500, Eric Sosman
<(E-Mail Removed)> wrote:

>Richard Harter wrote On 11/09/07 17:00,:
>> On Fri, 09 Nov 2007 11:36:18 -0500, Eric Sosman

[snip]

>>> Isn't there some way you can find an excuse to add
>>>a couple more arguments? Six is too many for most people
>>>to keep straight, but you may as well try to confuse the
>>>geniuses, too.

>>
>>
>> I take it you didn't check out the list of flags. I'm
>> after the geniuses too.

>
> Well, I noticed you needed a `long' to hold them,
>not a mere `int' ... Close to two dozen flags, aren't
>there?

22 at the last count. I expect I could trim a few if I had to.

>
>>> Kidding aside, an argument list of that length suggests
>>>to me that the function may be trying to be too many things
>>>to too many people at the same time. It might be wise to
>>>give up some functionality to improve ease of use; the net
>>>change in usefulness could in fact be positive.

>>
>> Point well taken, though there really is nothing that I would
>> want to give up. I wouldn't want to give up error information,
>> nor a bound on maximum line size, nor a reusable buffer (for
>> which both address and size are needed), and not even the line
>> length. One thing that could be done is package some of the
>> arguments in a struct. Frex:
>>
>> struct getfline_args {
>> size_t bufsize,
>> size_t length,
>> size_t maxlen,
>> long flags);
>>
>> Then we can have a prototype that looks like this:
>>
>> int * getfline(FILE *fptr,
>> char **line_ptr,
>> struct getfline_args *args);

>
> I'm not sure why the return value changed from an int
>to a pointer. Typo?

>
>> If we adopt the convention a maxlen of zero means no upper bound
>> check or alternately, another input flag needed to activate
>> bounds checking, then instead of a single long calling sequence
>> we can do:
>>
>> struct getfline_args gfl_args = {0,0,0,0};
>> char *line = 0;
>> FILE *fptr = 0;
>> ...
>> while (getfline(fptr,&line,&gfl_args) {
>> /* Do stuff */
>> }
>>
>> This doesn't really change anything but it makes it easier to use
>> the defaults and it moves some of the definitions into the
>> include file. What do you think of this?

>
> It reawakens memories of the days when interfaces
>used "control blocks," often populated by assembly macros.
>The C library's struct tm is such a beast, FILE can be
>thought of in that light, and POSIX has no shortage of
>structured arguments. If you take this route, you'll at
>least be on a well-marked trail.

>
> If you like the control block approach, though, why
>not go whole hog and put the line_ptr in the struct, too?
>You *could* even park the fptr there, but it could ease
>things a little if a struct initialized to all zeroes
>meant something sensible.

If we do the "all zeroes means something sensible" then you don't
want line_ptr in there. If you do the user code would need
something like

line = *(funky.line_ptr);

You could put line in there and let the user refer to funky.line.
That's a workable alternative but I didn't think it would be
popular. Perhaps the answer is that there are no alternatives
that would be popular.

>With a purpose-built struct
>type handy by, that overwhelming mass of flags might
>become a bunch of bit-fields. Bit-fields are, IMHO, a
>mixed blessing, but in a case like this they'd avoid the
>need to pollute the name space with all those GFL_xxx
>macros.

That's probably a good idea. I used the GFL_XXX hack to avoid
infringing on other namespaces. Still, "probably not used" is
not the same thing as "not used".

>
> Still, I can't shake the feeling that you're trying
>using a bazillion flags to govern different modes of
>operation, might you consider a suite of related functions
>to handle the important variations? In effect, you'd be
>moving a few flags out of the struct and into the function
>name. Observe the exec*() family of POSIX interfaces, for
>example: they all do "the same thing" and they probably
>all devolve on the same internal implementation, but using
>different names for different (although related) operations
>is a useful simplification. Try to imagine what it would
>be like if all of printf(), fprintf(), sprintf(), vprintf(),
>vfprintf(), vsprintf(), and vsnprintf() were packaged
>into one function name with a control block to choose
>different operation modes ... Ugh!

It could scarcely be less ugly than the *print* collection.

>
> As you may gather, I am a big fan of small, simple
>interfaces. Some people feel my adoration of simplicity
>is unhealthily intense; my own line-getter has been
>criticized for violating the "... but not simpler" part
>of Einstein's dictum. (The critic was someone I think
>highly of, so I gave his criticism careful thought before
>deciding to ignore it.) I mention all this to make my
>biases clear: the KISS principle is very important to me,
>but may not hold quite so much sway over others.

Here we may have to disagree a bit. The trouble with a KISS
approach in this case is that error and efficiency go by the
wayside. The simple way is

char * getalife(FILE *);

Now this means that you are automatically committed to separately
allocating space for each line. This is a tradeoff; what is
being traded is calling sequence simplicity for execution time.
A KISS advocate is saying in effect that the execution time is
not worth worrying about. I take the view that one should have a
choice. There is a second small issue with the KISS dictum - one
often ends up having to do extra work anyway - in this case
freeing the allocated space for each line.

The compact prototype also throws away the knowledge of the
length of the line. If users want that information they have to

The compact prototype leaves no room for a bound on the line
size. I believe an unnamed party has made that very point. I
happen to agree. I tend to take the view that malloc failures
mean that there is something fundamentally wrong with a program.

Another thing that is wrong with the KISS prototype is that there
is no place for an error return. I/O errors can be detected by
the user, but that again is extra code on the user side. However
there is no good way to pass back the knowledge that malloc has
failed.

Falconer's version isn't much better. IIRC it tooks like:

int nolifehere(FILE *, char **);

where I assume that the return is 0 if a line was read properly
and a code value if it fails in some way. This adds error
returns for a modest price in complexity. It has the same built
in automatic inefficiencies. It also steps on one of C's little
awkwardnesses, namely the convention that 0 is false and non-zero
is true. IMNSHO this is an ancient mistake; generally speaking
there usually is only one way to succeed and many ways to fail.
It would have better to do it the other way around. However it
is a convention cast is three billion year old granite so there
is nothing to be done about it.

The real problem with the KISS dictum in this case is that there
is no way to implement it and still reuse buffer space. The
problem is the itsagoodlife function needs state that is
persistent from call to call, and that the space for that state
must be supplied by the user. What it comes down to is that you
can have a getalife function with a simple interface but not a
itsagoodlife function with a simple interface.

Of course one could have a getalife wrapper around a itsagoodlife
implementation; I have no problem with that.

I expect I will do yet another spec and post it. Ugh. Still
more rewriting.

Richard Harter, (E-Mail Removed)
http://home.tiac.net/~cri, http://www.varinoma.com
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die

Eric Sosman
Guest
Posts: n/a

 11-11-2007
Richard Harter wrote:
> On Fri, 09 Nov 2007 18:30:39 -0500, Eric Sosman
> <(E-Mail Removed)> wrote:
>>
>> If you like the control block approach, though, why
>> not go whole hog and put the line_ptr in the struct, too?
>> You *could* even park the fptr there, but it could ease
>> things a little if a struct initialized to all zeroes
>> meant something sensible.

>
> If we do the "all zeroes means something sensible" then you don't
> want line_ptr in there. If you do the user code would need
> something like
>
> line = *(funky.line_ptr);
>
> You could put line in there and let the user refer to funky.line.
> That's a workable alternative but I didn't think it would be
> popular. Perhaps the answer is that there are no alternatives
> that would be popular.

I was thinking along the lines of "If the line_ptr element
is NULL, fgetline() allocates and thereafter manages a suitable
buffer." C.f. setvbuf().

>> As you may gather, I am a big fan of small, simple
>> interfaces. [...] I mention all this to make my
>> biases clear: the KISS principle is very important to me,
>> but may not hold quite so much sway over others.

>
> Here we may have to disagree a bit. The trouble with a KISS
> approach in this case is that error and efficiency go by the
> wayside. The simple way is
>
> char * getalife(FILE *);

Oddly enough, that's the signature of my line-getter.

> Now this means that you are automatically committed to separately
> allocating space for each line.

Well, no. Mine reuses (and perhaps expands) the buffer at
each call. You only get to keep one line at at time "internal"
to the line-getter, which doesn't seem to be a problem: My usual
pattern is not to save the lines verbatim, but to parse them and
extract data. If you *do* want to save the lines verbatim, you
can't re-use the same buffer in any case (although in this case
mine would incur a string copy yours might be able to avoid).

> The compact prototype also throws away the knowledge of the
> length of the line. If users want that information they have to
> call strlen, i.e., redo the work that had already be done.

True. (Shrug.) I have seldom found the line length an
interesting datum.

> The compact prototype leaves no room for a bound on the line
> size.

True. (Shrug, perhaps with a touch less assuredness.) If
you know an upper limit on line length, fgets() is available.

> Another thing that is wrong with the KISS prototype is that there
> is no place for an error return. I/O errors can be detected by
> the user, but that again is extra code on the user side. However
> there is no good way to pass back the knowledge that malloc has
> failed.

"No place for an error return?" Mine returns NULL for end-of-
input, I/O error, or malloc() failure, which can be disambiguated
(if desired) by calling feof() and ferror(). Usually the program
only cares about normal-vs.-other, and the code looks like

while ((line = getline(stream)) != NULL) {
...
}
if (! feof(stream)) {
die_horribly();
}

Note that *some* kind of test is necessary in any event,
since the underlying getc() lumps end-of-input and error into
a single EOF return value.

> The real problem with the KISS dictum in this case is that there
> is no way to implement it and still reuse buffer space.

Not true of mine.

I acknowledge that my "simplicity trumps all" approach is not
The Answer for all situations, nor (obviously) for all programmers.
My misgiving about the design you are working on is that it seems
to be trying to be The Answer, and I doubt that's a reasonable goal.

--
Eric Sosman
(E-Mail Removed)lid

CBFalconer
Guest
Posts: n/a

 11-12-2007
Richard Harter wrote:
>

.... snip ...
>
> where I assume that the return is 0 if a line was read properly
> and a code value if it fails in some way. This adds error
> returns for a modest price in complexity. It has the same built
> in automatic inefficiencies. It also steps on one of C's little
> awkwardnesses, namely the convention that 0 is false and non-zero
> is true. IMNSHO this is an ancient mistake; generally speaking
> there usually is only one way to succeed and many ways to fail.
> It would have better to do it the other way around. However it
> is a convention cast is three billion year old granite so there
> is nothing to be done about it.

No, the thing is that there is only one signal needed for 'OK', but
it is quite possible to return various error forms. The 0 == OK
matches this. It is a general practice in the standard library,
and avoids returning a separate error value.

--
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

David Thompson
Guest
Posts: n/a

 11-12-2007
On Thu, 01 Nov 2007 16:46:32 GMT, (E-Mail Removed) (Richard Harter) wrote:

> On Thu, 11 Oct 2007 23:05:29 GMT, (E-Mail Removed) (Richard Harter)
> wrote:
> ><snip> we are throwing away information that (usually) has to be
> >recomputed, namely the length of the line and, as we shall see,
> >status information. If we want to add the information to our
> >prototype there are several that to do it - it can be
> >passed back through the calling sequence or it can be returned by
> >the function. (We could also pass it to a global - ugh, don't do
> >it.) My vote is to have it all returned by the function which
> >means we create a struct (record) <snip>

> Here I went wrong. Having a struct (record) to hold information
> is plausible, but it may be overkill. Be that as it may, in a C
> implementation this prototype doesn't work too well. First of all
> the user has to go through the struct to access the line and the
> status field. More importantly the normal usage in C would be a
> while loop, e.g.,
>
> struct fgetline_info fg;
> ...
> while (fg = fgetline(FILE *fptr, size_t maxsize)) {...}
>
> This doesn't work - the idiom requires a test on something that
> can be "zero". If we convert the prototype and fg to a pointer
> the code still doesn'twork. Now the problem is that when the read
> is successful we don't need the status field; when the read fails
> we don't have the status field.
>

You can do
while( (fg = fgetline (...)) . status == 0 ){ ... }
/* or != 0, or > 0, or whatever your semantics is */

This is not usual practice (in C), and I'd want to get some experience
with it before deciding whether I PREFER it, but it definitely works.

FWIW it could be argued that it's analogous to the much more idiomatic
use in LISP of a multiple-valued return that silently collapses to the
first value if the caller doesn't ask for the rest.

<snip MANY other points>
- formerly david.thompson1 || achar(64) || worldnet.att.net

Chris Torek
Guest
Posts: n/a

 11-12-2007
In article <1194651042.178648@news1nwk>,
Eric Sosman <(E-Mail Removed)> wrote:
[massive snippage]
> It reawakens memories of the days when interfaces
>used "control blocks," ...

> As you may gather, I am a big fan of small, simple
>interfaces. Some people feel my adoration of simplicity
>is unhealthily intense; ...

There is a middle path here. Consider:

char *get_a_line(FILE *file, struct whatever *control_block);

where the "control_block" parameter is optional, and may be
given as NULL. The "simple interface user" then writes:

while ((line = get_a_line(fp, NULL)) != NULL)
... work with the line ...

and ignores all distinctions between "complete" and "incomplete"
lines, "ordinary EOF" and "read error", and so on. The "complex
interface with control block" user creates and populates the control
block, and does:

while (get_a_line(fp, &cb), cb.status == OK)
... etc ...

or whatever is appropriate. (Note: the above assumes that the
return value used in the "simple interface" case is duplicated
somewhere in the control block, in this case.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
Reading email is like searching for food in the garbage, thanks to spammers.

Charlie Gordon
Guest
Posts: n/a

 11-12-2007
"CBFalconer" <(E-Mail Removed)> a écrit dans le message de news:
(E-Mail Removed)...
> Eric Sosman wrote:
>>

> ... snip ...
>>
>> As you may gather, I am a big fan of small, simple interfaces.
>> Some people feel my adoration of simplicity is unhealthily
>> intense; my own line-getter has been criticized for violating
>> the "... but not simpler" part of Einstein's dictum. (The
>> critic was someone I think highly of, so I gave his criticism
>> careful thought before deciding to ignore it.) I mention all
>> this to make my biases clear: the KISS principle is very
>> important to me, but may not hold quite so much sway over others.

>
> Here is the heart of my ggets function, which more or less adheres
> to your principles. The entire package, with testing code, demos,
> docs, etc. is available at:
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #include "ggets.h"
>
> #define INITSIZE 112 /* power of 2 minus 16, helps malloc */
> #define DELTASIZE (INITSIZE + 16)
>
> enum {OK = 0, NOMEM};

The return value is poor: testing for EOF is indirect.
Returning the length of the string could be useful too.

> int fggets(char* *ln, FILE *f)
> {
> int cursize, ch, ix;

Why not size_t for the size and the index ?

> char *buffer, *temp;
>
> *ln = NULL; /* default */
> if (NULL == (buffer = malloc(INITSIZE))) return NOMEM;
> cursize = INITSIZE;
>
> ix = 0;
> while ((EOF != (ch = getc(f))) && ('\n' != ch)) {
> if (ix >= (cursize - 1)) { /* extend buffer */
> cursize += DELTASIZE;
> if (NULL == (temp = realloc(buffer, (size_t)cursize))) {
> /* ran out of memory, return partial line */
> buffer[ix] = '\0';
> *ln = buffer;

You should push the character back to the stream with ungetc(ch, f), or it
will be lost.

> return NOMEM;
> }
> buffer = temp;
> }
> buffer[ix++] = ch;
> }
> if ((EOF == ch) && (0 == ix)) {
> free(buffer);
> return EOF;
> }
>
> buffer[ix] = '\0';
> if (NULL == (temp = realloc(buffer, (size_t)ix + 1))) {
> *ln = buffer; /* without reducing it */
> }
> else *ln = temp;
> return OK;
> } /* fggets */

This function effectively calls malloc *and* realloc at least once per line
successfully read: that's simple but quite inefficient.

--
Chqrlie.

Richard Heathfield
Guest
Posts: n/a

 11-12-2007
Charlie Gordon said:

> "CBFalconer" <(E-Mail Removed)> a écrit dans le message de news:

<snip>

>> } /* fggets */

>
> This function effectively calls malloc *and* realloc at least once per
> line successfully read: that's simple but quite inefficient.

Yeah, he's been told over and over again about that.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999