Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Taking a stab at getline

Reply
Thread Tools

Taking a stab at getline

 
 
Eric Sosman
Guest
Posts: n/a
 
      02-07-2013
On 2/7/2013 11:44 AM, Noob wrote:
> Eric Sosman wrote:
>> On 2/7/2013 7:42 AM, Noob wrote:
>>> [...]
>>> s = realloc(s, max);
>>> if (s == NULL) exit(EXIT_FAILURE);

>>
>> Ugh. Ugh. Ugh. Yes, I saw what you wrote about kicking the
>> bucket -- but here you're kicking the entire program's bucket,
>> without even knowing what the bucket may hold. Ugh. Ugh. Ugh.

>
> Thing is, since I chose NULL to mean EOF, there was no "room"
> left for me to signal OOM. Unless I change the prototype...
> I'll give it a shot in v2.


One possibility is to keep your "narrow" interface and
let the caller figure it out:

char *line = get_line(stream);
if (line == NULL) {
if (feof(stream))
fputs("End of file\n", stderr);
else if ferror(stream))
fputs("I/O error\n", stderr);
else
fputs("Out of memory\n", stderr);
}

This is not as squeaky-clean as some other approaches (for example,
you could return a struct containing a status code and a pointer),
but it still allows the caller to discover everything needed.

If you decide to return NULL (or other failure indication) on
an I/O error, it might be a good idea to preserve the errno value
in case the failing getc() set it. The issue is to prevent other
functions from clobbering it while cleaning up: copy it to an int
of your own, call free() or whatever, then restore it before
returning.

>>> return realloc(s, len+1);

>>
>> ... possibly returning NULL after successfully reading a line,
>> plus leaking the memory where the line resides ... You could
>> retain the original pointer and return it if realloc() fails, or
>> you could make BUFLEN smaller and just not worry about the extra
>> few bytes.

>
> Wait what?! Reallocating to a smaller size (i.e. truncating
> excessive space) could fail?


Sure. Consider a memory manager that uses different "arenas"
for different allocation sizes: Shrinking an allocation might move
it from one arena to another. Or consider a memory manager that
*always* reallocates, maybe filling the old area with 0xDEADBEEF
to aid in debugging.

> At this point, len < max, thus len+1 <= max
>
> And I thought realloc frees its argument if it doesn't return
> what was passed in?


No: If realloc() fails, it leaves the original allocation
untouched.

--
Eric Sosman
d
 
Reply With Quote
 
 
 
 
Noob
Guest
Posts: n/a
 
      02-07-2013
James Kuyper wrote:

> You're treating End Of File, End Of Line, and an I/O error in exactly
> the same fashion. In some contexts that might be acceptable, but I would
> ordinarily want to distinguish them. As long as you're restricted to
> reporting errors by returning NULL, you can't distinguish them. That's
> why I generally prefer using the return value of my function to report
> an error code; I'd take a pointer to a pointer as an argument to the
> function, and return the pointer the allocated memory through that argument.


getc returning -1 means either EOF or I/O error, right?
At this point, would the following assertion hold:

!feof(stream) == !!ferror(stream)

If we're dealing with an I/O error, how can we learn more about
the nature of the error? Is errno set at some point?

> When this routine is done, the only record of the length of the
> allocated memory is the null character at the end of the allocation. I
> prefer storing the length along with the allocation. To avoid having to
> allocate two separate objects, I use a flexible array member:
>
> struct allocation {
> size_t length;
> allocated_type data[];
> };
>
> In this case, allocated_type is char. Allocate the entire structure, and
> then store it's length in the structure. Reallocate as needed, storing
> the new length in the structure only after the reallocation is successful.
>
> On the rare occasions when I've done anything like this, I've usually
> occasionally desired to re-use an existing allocation. Therefore, the
> routine should take a pointer to an allocation as input, and malloc() a
> new one only if that pointer is null.


The getline implementation in glibc works like you describe.
Using a flexible array is an interesting suggestion.

Regards.

 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      02-07-2013
On 2/7/2013 12:16 PM, Noob wrote:
> James Kuyper wrote:
>
>> You're treating End Of File, End Of Line, and an I/O error in exactly
>> the same fashion. In some contexts that might be acceptable, but I would
>> ordinarily want to distinguish them. As long as you're restricted to
>> reporting errors by returning NULL, you can't distinguish them. That's
>> why I generally prefer using the return value of my function to report
>> an error code; I'd take a pointer to a pointer as an argument to the
>> function, and return the pointer the allocated memory through that argument.

>
> getc returning -1 means either EOF or I/O error, right?


Almost, but not quite. A return of EOF (often -1, but
possibly some other negative value) means end-of-input or
I/O error. On some "exotic" systems it might even mean a
character was successfully read; see below.

> At this point, would the following assertion hold:
>
> !feof(stream) == !!ferror(stream)


On an "exotic" system where UCHAR_MAX > INT_MAX, it is
possible that a perfectly ordinary unsigned char could be equal
to EOF after conversion to int; if both feof() and ferror() are
false you've successfully read that character.

Even on a "vanilla" system I think both feof() and ferror()
could be true, if you've kept on calling getc() after an earlier
failure indication. The error and end-of-file indicators are
"sticky:" once they're set, they remain set until you clear
them (the actions that clear them are different for the two
indicators). So if you breeze past an I/O error and then
encounter end-of-input, I think both indicators could be set.

> If we're dealing with an I/O error, how can we learn more about
> the nature of the error? Is errno set at some point?


Some implementations may set it; none are obliged to. This
is part of the price one pays for portable I/O: The library
shields you from the ugly details of the underlying system, but
at the same time prevents you from peeking at them. "Shielding"
and "hiding" are closely related ...

--
Eric Sosman
d
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      02-07-2013
Noob <root@127.0.0.1> writes:
> James Kuyper wrote:
>> You're treating End Of File, End Of Line, and an I/O error in exactly
>> the same fashion. In some contexts that might be acceptable, but I would
>> ordinarily want to distinguish them. As long as you're restricted to
>> reporting errors by returning NULL, you can't distinguish them. That's
>> why I generally prefer using the return value of my function to report
>> an error code; I'd take a pointer to a pointer as an argument to the
>> function, and return the pointer the allocated memory through that argument.

>
> getc returning -1 means either EOF or I/O error, right?


Terminology problem.

EOF is a macro defined in <stdio.h>. It expands to an integer
constant expression of type int with a negative value. It's a value
that can be returned by certain I/O functions, usually to indicate
that no more input is available. Its value is typically -1, but
you shouldn't assume that.

If getc() returns the value of EOF, it means that it tried and failed
to read a character from stdin. This can be either because of an
end-of-file condition (which will cause feof(stdin) to return a true
result), or by an error condition (which will cause ferror(stdin)
to return a true result).

EOF stands for "end of file", but don't confuse EOF (either the
macro or the int value it denotes) with an end-of-file condition.

As the standard says (N1570 7.21.7.5):

int getc(FILE *stream);

The getc function returns the next character from the input
stream pointed to by stream. If the stream is at end-of-file,
the end-of-file indicator for the stream is set and getc returns
EOF. If a read error occurs, the error indicator for the stream
is set and getc returns EOF.

(It's a bit easier to read with the highlighting in the PDF.)

> At this point, would the following assertion hold:
>
> !feof(stream) == !!ferror(stream)


It should, yes.

> If we're dealing with an I/O error, how can we learn more about
> the nature of the error? Is errno set at some point?


Maybe. The standard doesn't require anything in <stdio.h> to set errno,
but POSIX does.

[...]

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      02-07-2013
Noob <root@127.0.0.1> writes:

> Here's my attempt at writing a "get_line" implementation, which
> reads an entire line from a file stream, dynamically allocating
> the space needed to store said line.


A couple of things that have not yet been said...

> Since this is for my own personal use, I didn't bother handling
> out-of-memory conditions elegantly. I just kick the bucket.
> ("We all kick the bucket in the end, in the end.")
>
> EOF is signaled by returning NULL.
>
> I'd like to hear suggestions/criticism.
>
> #include <stdio.h>
> #include <stdlib.h>
> #define BUFLEN 4000


Tiny point: I don't like the name, since it is not the length of a
buffer -- it's a length increment.

More substantively, it's often better to grow the allocation by a
constant factor rather than by a constant increment. However, since
realloc is probably cheap compared to I/O it may not matter here, but
it's worth considering.

> char *get_line(FILE *stream)
> {
> char *s = NULL; size_t len = 0; int c;
>
> do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
> if (c == EOF) return NULL;
> ungetc(c, stream);
>
> while ( 1 )
> {
> size_t max = len + BUFLEN;
> s = realloc(s, max);
> if (s == NULL) exit(EXIT_FAILURE);


If you take Eric's suggestion of returning NULL (leaving the caller to
figure our the reason for the failure) you should adjust this to free
the allocated buffer before returning. If a huge line has caused an out
of memory condition, the last thing you want to do is return without
freeing what you can.

> while (len < max)
> {
> c = fgetc(stream);
> if (c == EOF || c == '\n')
> {
> s[len] = '\0';
> return realloc(s, len+1);
> }
> s[len++] = c;
> }
> }
> }
>
> Regards.


--
Ben.
 
Reply With Quote
 
Shao Miller
Guest
Posts: n/a
 
      02-08-2013
On 2/7/2013 07:42, Noob wrote:
>
> Here's my attempt at writing a "get_line" implementation, which
> reads an entire line from a file stream, dynamically allocating
> the space needed to store said line.
>


Based on your other posts in this thread, I gather your next version
won't skip lines, or it'll be documented and the name adjusted.

> Since this is for my own personal use, I didn't bother handling
> out-of-memory conditions elegantly. I just kick the bucket.
> ("We all kick the bucket in the end, in the end.")
>
> EOF is signaled by returning NULL.
>


An alternative is to return an 'enum' value which a caller can 'switch'
with, or opaque values which a header can define macros for.

> I'd like to hear suggestions/criticism.
>
> #include <stdio.h>
> #include <stdlib.h>
> #define BUFLEN 4000


As Ben mentioned, it might be better called 'BUFINC', or some such. Or
you could use 'BUFSIZ' from stdio.h. Or you could allow the caller to
make the choice either through a parameter, encapsulating the choice in
some kind of state, or even with a callback function that allows the
caller to specify the method of growth. (You could have a default
choice or callback, in case the caller isn't interested.)

> char *get_line(FILE *stream)
> {
> char *s = NULL; size_t len = 0; int c;
>
> do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
> if (c == EOF) return NULL;
> ungetc(c, stream);
>
> while ( 1 )
> {
> size_t max = len + BUFLEN;
> s = realloc(s, max);
> if (s == NULL) exit(EXIT_FAILURE);


Although you said above that error-handling wasn't particularly
graceful, this call to 'exit' seems a bit harsh.

> while (len < max)
> {
> c = fgetc(stream);
> if (c == EOF || c == '\n')
> {
> s[len] = '\0';
> return realloc(s, len+1);
> }
> s[len++] = c;
> }
> }
> }
>


Since a stream might be "line buffered", I wonder if you could gain any
sort of advantage by using 'fscanf'... The following quick C89 code
(could have bugs!) doesn't try to grow any buffer, but uses 'fscanf':

#include <stdio.h>

int fetch_a_line(char * buf, unsigned int bufsz, FILE * stream) {
char format_buf[
/* '%' */
1 +
/* 32 decimal digits */
32 +
/* "[^\n]%n" */
6 +
/* null terminator */
1
];
int matched;
int bytes;
int c;

sprintf(format_buf, "%%%u[^\n]%%n", bufsz);
matched = fscanf(stream, format_buf, buf, &bytes);
c = getc(stream);
if (c != '\n')
ungetc(c, stream);
if (matched > 0)
return bytes;
return matched;
}

int main(void) {
char test[512];
int bytes;

while (1) {
bytes = fetch_a_line(test, sizeof test - 1, stdin);
if (bytes == EOF)
break;
printf("Read line of %d characters: %s\n", bytes, test);
}
printf("Done.\n");
return 0;
}

--
- Shao Miller
--
"Thank you for the kind words; those are the kind of words I like to hear.

Cheerily," -- Richard Harter
 
Reply With Quote
 
Noob
Guest
Posts: n/a
 
      02-08-2013
pete wrote:

> I deactivated that feature
> and modified a get_line_test program which I already had,
> so that it would work with your function.


What was your conclusion of your test?

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      02-09-2013
Philip Lantz <> writes:
> Keith Thompson wrote:
>> Noob writes:
>> > getc returning -1 means either EOF or I/O error, right?

>>
>> Terminology problem.
>>
>> EOF is a macro defined in <stdio.h>.

>
> Yes, but "EOF" also means "end of file" in ordinary everyday
> computerese, which is clearly the way he was using the word in the
> sentence above.


Yes, but given the existence of the EOF macro, using EOF *in a C
context* to refer to an end-of-file condition is potentially confusing.

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Rosario1903
Guest
Posts: n/a
 
      02-09-2013
On Thu, 07 Feb 2013 13:42:12 +0100, Noob <root@127.0.0.1> wrote:

>Hello group,
>
>Here's my attempt at writing a "get_line" implementation, which
>reads an entire line from a file stream, dynamically allocating
>the space needed to store said line.
>
>Since this is for my own personal use, I didn't bother handling
>out-of-memory conditions elegantly. I just kick the bucket.
>("We all kick the bucket in the end, in the end.")
>
>EOF is signaled by returning NULL.
>
>I'd like to hear suggestions/criticism.


// ritorna un puntatore da liberarsi tramite free()
// ritorna 0 per errore
char* __export GetLineI(char* testo)
{unsigned s, i;
int c;
char *p,*t;
s=1024; p=malloc(s+4);
if(p==0) return 0;
printf("%s", testo); fflush(stdout);
for(i=0; (c=getchar())!=EOF ; )
{if(i>=s)
{s=s+1024;
t=realloc(p, s+4);
if(t==0||(int)s<0)
{free(p); return 0;}
p=t;
}
p[i]=c; ++i;
if(c=='\n')
break;
}
p[i]=0;
return p;
}


 
Reply With Quote
 
Shao Miller
Guest
Posts: n/a
 
      02-09-2013
On 2/9/2013 13:02, Rosario1903 wrote:
> On Thu, 07 Feb 2013 13:42:12 +0100, Noob <root@127.0.0.1> wrote:
>
>> Hello group,
>>
>> Here's my attempt at writing a "get_line" implementation, which
>> reads an entire line from a file stream, dynamically allocating
>> the space needed to store said line.
>>
>> Since this is for my own personal use, I didn't bother handling
>> out-of-memory conditions elegantly. I just kick the bucket.
>> ("We all kick the bucket in the end, in the end.")
>>
>> EOF is signaled by returning NULL.
>>
>> I'd like to hear suggestions/criticism.

>
> [some code]


Here's an 'indent -kr' version of Rosario1903's code:

// ritorna un puntatore da liberarsi tramite free()
// ritorna 0 per errore
char *__export GetLineI(char *testo)
{
unsigned s, i;
int c;
char *p, *t;
s = 1024;
p = malloc(s + 4);
if (p == 0)
return 0;
printf("%s", testo);
fflush(stdout);
for (i = 0; (c = getchar()) != EOF {
if (i >= s) {
s = s + 1024;
t = realloc(p, s + 4);
if (t == 0 || (int) s < 0) {
free(p);
return 0;
}
p = t;
}
p[i] = c;
++i;
if (c == '\n')
break;
}
p[i] = 0;
return p;
}


--
- Shao Miller
--
"Thank you for the kind words; those are the kind of words I like to hear.

Cheerily," -- Richard Harter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LINQ and Entity Framework - Worth a stab? Cirene ASP .Net 6 09-08-2008 01:42 PM
First stab at a weakref test suite Daniel Berger Ruby 6 12-22-2007 07:07 PM
Need camera advice sd memory, aa bat. image stab Bob Digital Photography 3 09-15-2006 08:16 PM
Difference in module_eval taking block vs. taking string (1.8 bug?) Jim Cain Ruby 1 07-18-2003 02:01 AM
ifstream getline() problem John C++ 10 07-14-2003 04:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57