Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Taking a stab at getline

Reply
Thread Tools

Taking a stab at getline

 
 
Noob
Guest
Posts: n/a
 
      02-07-2013
Hello group,

Here's my attempt at writing a "get_line" implementation, which
reads an entire line from a file stream, dynamically allocating
the space needed to store said line.

Since this is for my own personal use, I didn't bother handling
out-of-memory conditions elegantly. I just kick the bucket.
("We all kick the bucket in the end, in the end.")

EOF is signaled by returning NULL.

I'd like to hear suggestions/criticism.

#include <stdio.h>
#include <stdlib.h>
#define BUFLEN 4000
char *get_line(FILE *stream)
{
char *s = NULL; size_t len = 0; int c;

do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
if (c == EOF) return NULL;
ungetc(c, stream);

while ( 1 )
{
size_t max = len + BUFLEN;
s = realloc(s, max);
if (s == NULL) exit(EXIT_FAILURE);
while (len < max)
{
c = fgetc(stream);
if (c == EOF || c == '\n')
{
s[len] = '\0';
return realloc(s, len+1);
}
s[len++] = c;
}
}
}

Regards.
 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      02-07-2013
"Noob" <root@127.0.0.1> wrote in message news:kf07e7$v0d$(E-Mail Removed)...

> I'd like to hear suggestions/criticism.


No user docs given. As I understand how it works:

o The function allocates a new dynamic buffer for every line read; the
caller is responsible for freeing this.

o The buffer allocated will always be at least 4000 bytes? (Significant if
the caller wants to store these pointers in an array)

o A return value of NULL means end-of-file (out-of-memory signalled by
aborting)

o *Blank lines are ignored*. (This is different from a normal getline();
blank lines can be significant. However blank lines containing only white
space *are* returned.)

o The return value points to a null-terminated string (which always contain
at least one character), when it's not NULL.

> while ( 1 )
> {
> size_t max = len + BUFLEN;
> s = realloc(s, max);


> if (s == NULL) exit(EXIT_FAILURE);
> while (len < max)
> {
> c = fgetc(stream);
> if (c == EOF || c == '\n')
> {
> s[len] = '\0';
> return realloc(s, len+1);


Why the extra byte allocated here?

--
Bartc

 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      02-07-2013
"BartC" <(E-Mail Removed)> wrote in message news:0oOQs.3694$(E-Mail Removed)7...
> "Noob" <root@127.0.0.1> wrote in message
> news:kf07e7$v0d$(E-Mail Removed)...
>


> o The buffer allocated will always be at least 4000 bytes? (Significant if
> the caller wants to store these pointers in an array)


>> return realloc(s, len+1);

>
> Why the extra byte allocated here?


I got it just as I pressed Send! (I rarely use realloc.)

--
Bartc

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      02-07-2013
On 2/7/2013 7:42 AM, Noob wrote:
> Hello group,
>
> Here's my attempt at writing a "get_line" implementation, which
> reads an entire line from a file stream, dynamically allocating
> the space needed to store said line.
>
> Since this is for my own personal use, I didn't bother handling
> out-of-memory conditions elegantly. I just kick the bucket.
> ("We all kick the bucket in the end, in the end.")
>
> EOF is signaled by returning NULL.
>
> I'd like to hear suggestions/criticism.
>
> #include <stdio.h>
> #include <stdlib.h>
> #define BUFLEN 4000
> char *get_line(FILE *stream)
> {
> char *s = NULL; size_t len = 0; int c;
>
> do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */


It's a matter of preference, but I wouldn't choose to do this.
Whether an empty line is meaningful or not isn't something for a
low-level utility function to decide.

> if (c == EOF) return NULL;
> ungetc(c, stream);
>
> while ( 1 )
> {
> size_t max = len + BUFLEN;


There's a faint chance this sum could wrap around, but you're
probably safe in ignoring it: You're likely to run out of memory
before exceeding what a size_t can handle. Still, an "industrial-
strength" function would be on guard against the possibility.

> s = realloc(s, max);
> if (s == NULL) exit(EXIT_FAILURE);


Ugh. Ugh. Ugh. Yes, I saw what you wrote about kicking the
bucket -- but here you're kicking the entire program's bucket,
without even knowing what the bucket may hold. Ugh. Ugh. Ugh.

> while (len < max)
> {
> c = fgetc(stream);


Why fgetc() instead of getc()? (Same question above, too.)

> if (c == EOF || c == '\n')
> {
> s[len] = '\0';
> return realloc(s, len+1);


... possibly returning NULL after successfully reading a line,
plus leaking the memory where the line resides ... You could
retain the original pointer and return it if realloc() fails, or
you could make BUFLEN smaller and just not worry about the extra
few bytes.

> }
> s[len++] = c;
> }
> }
> }


--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      02-07-2013
On 02/07/2013 07:42 AM, Noob wrote:
> Hello group,
>
> Here's my attempt at writing a "get_line" implementation, which
> reads an entire line from a file stream, dynamically allocating
> the space needed to store said line.
>
> Since this is for my own personal use, I didn't bother handling
> out-of-memory conditions elegantly. I just kick the bucket.
> ("We all kick the bucket in the end, in the end.")
>
> EOF is signaled by returning NULL.
>
> I'd like to hear suggestions/criticism.
>
> #include <stdio.h>
> #include <stdlib.h>
> #define BUFLEN 4000
> char *get_line(FILE *stream)
> {
> char *s = NULL; size_t len = 0; int c;
>
> do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */
> if (c == EOF) return NULL;
> ungetc(c, stream);
>
> while ( 1 )
> {
> size_t max = len + BUFLEN;
> s = realloc(s, max);
> if (s == NULL) exit(EXIT_FAILURE);
> while (len < max)
> {
> c = fgetc(stream);
> if (c == EOF || c == '\n')
> {


You're treating End Of File, End Of Line, and an I/O error in exactly
the same fashion. In some contexts that might be acceptable, but I would
ordinarily want to distinguish them. As long as you're restricted to
reporting errors by returning NULL, you can't distinguish them. That's
why I generally prefer using the return value of my function to report
an error code; I'd take a pointer to a pointer as an argument to the
function, and return the pointer the allocated memory through that argument.

> s[len] = '\0';
> return realloc(s, len+1);
> }
> s[len++] = c;
> }
> }
> }


When this routine is done, the only record of the length of the
allocated memory is the null character at the end of the allocation. I
prefer storing the length along with the allocation. To avoid having to
allocate two separate objects, I use a flexible array member:

struct allocation {
size_t length;
allocated_type data[];
};

In this case, allocated_type is char. Allocate the entire structure, and
then store it's length in the structure. Reallocate as needed, storing
the new length in the structure only after the reallocation is successful.

On the rare occasions when I've done anything like this, I've usually
occasionally desired to re-use an existing allocation. Therefore, the
routine should take a pointer to an allocation as input, and malloc() a
new one only if that pointer is null.
--
James Kuyper
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      02-07-2013
On 02/07/2013 08:55 AM, BartC wrote:
....
> o The buffer allocated will always be at least 4000 bytes? (Significant if
> the caller wants to store these pointers in an array)


No, it gets reallocated below 4000 if less space is needed.

....
> o *Blank lines are ignored*. (This is different from a normal getline();
> blank lines can be significant. However blank lines containing only white
> space *are* returned.)


It ignores leading blank lines. That's a specialized usage, A general
purpose function shouldn't work that way. But blank lines after the
first blank line are not ignored; they cause the return of a pointer to
null-terminated string containing no characters, which seems reasonable.
--
James Kuyper
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      02-07-2013
On 2/7/2013 10:49 AM, James Kuyper wrote:
> On 02/07/2013 08:55 AM, BartC wrote:
> ...
>> o The buffer allocated will always be at least 4000 bytes? (Significant if
>> the caller wants to store these pointers in an array)

>
> No, it gets reallocated below 4000 if less space is needed.
>
> ...
>> o *Blank lines are ignored*. (This is different from a normal getline();
>> blank lines can be significant. However blank lines containing only white
>> space *are* returned.)

>
> It ignores leading blank lines. That's a specialized usage, A general
> purpose function shouldn't work that way. But blank lines after the
> first blank line are not ignored; they cause the return of a pointer to
> null-terminated string containing no characters, which seems reasonable.


It looks like *all* empty lines are skipped, since the
same skippage occurs on each call to the function. Or did
I miss something?

--
Eric Sosman
(E-Mail Removed)d
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      02-07-2013
On 02/07/2013 11:07 AM, Eric Sosman wrote:
> On 2/7/2013 10:49 AM, James Kuyper wrote:

....
>> It ignores leading blank lines. That's a specialized usage, A general
>> purpose function shouldn't work that way. But blank lines after the
>> first blank line are not ignored; they cause the return of a pointer to
>> null-terminated string containing no characters, which seems reasonable.

>
> It looks like *all* empty lines are skipped, since the
> same skippage occurs on each call to the function. Or did
> I miss something?


No, I just made a stupid mistake while reading the code. It's one of
several such mistakes, but I caught all (I hope!) of the others before
posting my message.
--
James Kuyper
 
Reply With Quote
 
Noob
Guest
Posts: n/a
 
      02-07-2013
Eric Sosman wrote:
> On 2/7/2013 7:42 AM, Noob wrote:
>> Hello group,
>>
>> Here's my attempt at writing a "get_line" implementation, which
>> reads an entire line from a file stream, dynamically allocating
>> the space needed to store said line.
>>
>> Since this is for my own personal use, I didn't bother handling
>> out-of-memory conditions elegantly. I just kick the bucket.
>> ("We all kick the bucket in the end, in the end.")
>>
>> EOF is signaled by returning NULL.
>>
>> I'd like to hear suggestions/criticism.
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #define BUFLEN 4000
>> char *get_line(FILE *stream)
>> {
>> char *s = NULL; size_t len = 0; int c;
>>
>> do c = fgetc(stream); while (c == '\n'); /* ignore leading empty lines */

>
> It's a matter of preference, but I wouldn't choose to do this.
> Whether an empty line is meaningful or not isn't something for a
> low-level utility function to decide.


You're probably right. I'll change it in v2.

>> if (c == EOF) return NULL;
>> ungetc(c, stream);
>>
>> while ( 1 )
>> {
>> size_t max = len + BUFLEN;

>
> There's a faint chance this sum could wrap around, but you're
> probably safe in ignoring it: You're likely to run out of memory
> before exceeding what a size_t can handle. Still, an "industrial-
> strength" function would be on guard against the possibility.


I understand the point you are making.

>> s = realloc(s, max);
>> if (s == NULL) exit(EXIT_FAILURE);

>
> Ugh. Ugh. Ugh. Yes, I saw what you wrote about kicking the
> bucket -- but here you're kicking the entire program's bucket,
> without even knowing what the bucket may hold. Ugh. Ugh. Ugh.


Thing is, since I chose NULL to mean EOF, there was no "room"
left for me to signal OOM. Unless I change the prototype...
I'll give it a shot in v2.

>> while (len < max)
>> {
>> c = fgetc(stream);

>
> Why fgetc() instead of getc()? (Same question above, too.)


At some point in the early stages, I was thinking that it might
be "dangerous" to use the macro version. Obviously, there's no
problem when 'stream' is a function argument.

>> if (c == EOF || c == '\n')
>> {
>> s[len] = '\0';
>> return realloc(s, len+1);

>
> ... possibly returning NULL after successfully reading a line,
> plus leaking the memory where the line resides ... You could
> retain the original pointer and return it if realloc() fails, or
> you could make BUFLEN smaller and just not worry about the extra
> few bytes.


Wait what?! Reallocating to a smaller size (i.e. truncating
excessive space) could fail?

At this point, len < max, thus len+1 <= max

And I thought realloc frees its argument if it doesn't return
what was passed in?

Regards.

 
Reply With Quote
 
Noob
Guest
Posts: n/a
 
      02-07-2013
BartC wrote:

> Noob wrote:
>
>> I'd like to hear suggestions/criticism.

>
> No user docs given. As I understand how it works:


I need to add a comment explaining what the caller is supposed
to do with the return value.

> o The function allocates a new dynamic buffer for every line read; the
> caller is responsible for freeing this.


Correct.

> o The buffer allocated will always be at least 4000 bytes? (Significant if
> the caller wants to store these pointers in an array)


No, the buffer is just big enough to store the string.

> o A return value of NULL means end-of-file (out-of-memory signaled by
> aborting)


Right.

> o *Blank lines are ignored*. (This is different from a normal getline();
> blank lines can be significant. However blank lines containing only white
> space *are* returned.)


Right. I'll make the behavior more consistent.

> o The return value points to a null-terminated string (which always contain
> at least one character), when it's not NULL.


Correct.

>> while ( 1 )
>> {
>> size_t max = len + BUFLEN;
>> s = realloc(s, max);

>
>> if (s == NULL) exit(EXIT_FAILURE);
>> while (len < max)
>> {
>> c = fgetc(stream);
>> if (c == EOF || c == '\n')
>> {
>> s[len] = '\0';
>> return realloc(s, len+1);

>
> Why the extra byte allocated here?


For the terminator

Regards.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LINQ and Entity Framework - Worth a stab? Cirene ASP .Net 6 09-08-2008 01:42 PM
First stab at a weakref test suite Daniel Berger Ruby 6 12-22-2007 07:07 PM
Need camera advice sd memory, aa bat. image stab Bob Digital Photography 3 09-15-2006 08:16 PM
Difference in module_eval taking block vs. taking string (1.8 bug?) Jim Cain Ruby 1 07-18-2003 02:01 AM
ifstream getline() problem John C++ 10 07-14-2003 04:47 PM



Advertisments