Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   getline problem (http://www.velocityreviews.com/forums/t743373-getline-problem.html)

Gand Alf 02-10-2011 07:48 PM

getline problem
 
Hello

I have been writing a getline-type function. It should read an arbitrary
length line from a file and return it in a buffer.

Here is my code


#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void
GL (char **buf, size_t * sz, FILE * fp)
{
size_t off = 0;
char *new;
for(;;)
{
if(*buf)
{
fgets (*buf + off, *sz - off, fp);
if(strchr (*buf + off, '\n'))
break;
off += *sz - 1;
}
else
*sz = 1;
new = realloc (*buf, *sz <<= 1);
if(!new)
{
free (*buf);
*buf = *sz = 0;
return;
}
*buf = new;
}
}

int
main (void)
{
char *p = 0;
size_t sz = 0;
while(!feof (stdin))
{
GL (&p, &sz, stdin);
printf ("%d\n", strlen (p));
}
}


However when I run this with the following test file as input:
==begin input==
hello
world
everyone

bye
==end input==
the results are as follows:

3
6
9
1
4
4

instead I would expect to get:
5
5
8
0
3

Can anyone see what the problem is?

Thanks

James Waldby 02-10-2011 09:03 PM

Re: getline problem
 
On Thu, 10 Feb 2011 19:48:18 +0000, Gand Alf wrote:
> I have been writing a getline-type function. It should read an
> arbitrary length line from a file and return it in a buffer.

[snip #includes]
> void
> GL (char **buf, size_t * sz, FILE * fp) {
> size_t off = 0;
> char *new;
> for(;;)

What, you can't spell 'while' ?

> {
> if(*buf)
> {
> fgets (*buf + off, *sz - off, fp);
> if(strchr (*buf + off, '\n'))
> break;
> off += *sz - 1;

^ You have an extra character there.
> }

[snip realloc & most of main]
Note, your main starts off properly enough with
"int main (void)", but at the end has no "return 0".

> However when I run this with the following test file as input: ==begin
> input==
> hello
> world
> everyone
>
> bye
> ==end input==
> the results are as follows:
>
> 3
> 6
> 9
> 1
> 4
> 4
>
> instead I would expect to get:
> 5
> 5
> 8
> 0
> 3
>
> Can anyone see what the problem is?


(a) Wrong expectation -- Per man fgets, "If a newline is read,
it is stored into the buffer", and (eg) strlen("world\n") is 6.

(b) Mishandled EOF test -- Again per man, "fgets() returns ...
NULL ... when end of file occurs while no characters have been
read", but you've discarded the value of fgets(), so can't put
a \0 at **buf, etc

--
jiw

Keith Thompson 02-10-2011 10:55 PM

Re: getline problem
 
pete <pfiland@mindspring.com> writes:
> James Waldby wrote:
>>
>> On Thu, 10 Feb 2011 19:48:18 +0000, Gand Alf wrote:

>
>> > for(;;)

>> What, you can't spell 'while' ?

>
> for(;;) is the special construct for infinite loops.


It's *a* construct for infinite loops.

> On some compilers, while(1) generates a warning.


while (1) and for (;;) are equally valid. Compilers can warn about
anything they like.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Michael Press 02-11-2011 06:36 AM

Re: getline problem
 
In article <ij1jrl$i7p$1@news.eternal-september.org>,
James Waldby <not@valid.invalid> wrote:

> On Thu, 10 Feb 2011 19:48:18 +0000, Gand Alf wrote:


[...]

> > for(;;)

> What, you can't spell 'while' ?


I forgot around the 999th time I changed
`while(1)' to `for(<initialize>; <test>; <iterate>)'.

--
Michael Press

Myth__Buster 02-11-2011 04:43 PM

Re: getline problem
 
@ OP

<problematic snippet>

if(strchr (*buf + off, '\n'))
break;

</problematic snippet>

Well, there is indeed an extra character being considered as part of
the string read before the control passes back to main. So, for this
we would just replace it with NUL providing the required NUL-
terminated C string back to main.

<corrected snippet>

char *ptrToNewLineChar = strchr (*buf + off, '\n');

if( ptrToNewLineChar != NULL )
{
*ptrToNewLineChar = '\0';
break;
}

</corrected snippet>

In addition, if the specified number of characters in fgets are not
read in then, there would be a need to update the offset(variable -
off) appropriately. I would suggest to consider the strlen( ) to help
in this regard as the the read buffer(variable - buf) would be a C
string, to count the exact number of characters read correctly once we
chop off the newline character from the read buffer as shown above.
The corresponding implementation is below.

<problematic snippet>

off += *sz - 1;

</problematic snippet>

<corrected snippet>

off += strlen(*buf + off);

</corrected snippet>


Cheers.


James Harris 02-11-2011 06:03 PM

Re: getline problem
 
On Feb 10, 7:48*pm, Gand Alf <nos...@nospam.com> wrote:
> Hello
>
> I have been writing a getline-type function. It should read an arbitrary
> length line from a file and return it in a buffer.


....

> Can anyone see what the problem is?


Just a FYI. fgets is ok for limited use but it has some issues which
make it not suitable unless you can guarantee the input to it. See

http://codewiki.wikispaces.com/xbuf.c

and look at the section What's wrong with fgets?

James

Malcolm McLean 02-13-2011 12:25 PM

Re: getline problem
 
On Feb 11, 8:03*pm, James Harris <james.harri...@googlemail.com>
wrote:
>
>
> and look at the section What's wrong with fgets?
>

This doesn't mention the real problem, which is that fgets() is just
too difficult to use correctly if a program must process all input
with absolutely no errors.

You have to to call strchr() to check for the newline. If it is
absent, a partial read has occurred. So you need to take action to
ensure that the characters remaining in the stream are not treated as
a whole line. All very fiddly.
However in most applications you can just assume that a partial read
will generate a parse error on the next line.





James Harris 02-13-2011 07:56 PM

Re: getline problem
 
On Feb 13, 12:25*pm, Malcolm McLean <malcolm.mcle...@btinternet.com>
wrote:
> On Feb 11, 8:03*pm, James Harris <james.harri...@googlemail.com>
> wrote:
>
> > and look at the section What's wrong with fgets?

>
> This doesn't mention the real problem, which is that fgets() is just
> too difficult to use correctly if a program must process all input
> with absolutely no errors.
>
> You have to to call strchr() to check for the newline. If it is
> absent, a partial read has occurred. So you need to take action to
> ensure that the characters remaining in the stream are not treated as
> a whole line. All very fiddly.
> However in most applications you can just assume that a partial read
> will generate a parse error on the next line.


It's worse than I thought!

Incidentally, it occurred to me that another, more modern way to read
a line from a file may be to mmap the file and use memchr or similar
to scan it. Maybe the fastest option? As a slight downside, the code
might need to account for very large files that exceed address space.

James

Eric Sosman 02-13-2011 09:00 PM

[OT] Re: getline problem
 
On 2/13/2011 2:56 PM, James Harris wrote:
> [...]
> Incidentally, it occurred to me that another, more modern way to read
> a line from a file may be to mmap the file and use memchr or similar
> to scan it. Maybe the fastest option? As a slight downside, the code
> might need to account for very large files that exceed address space.


<off-topic reason="requires extensions">

That's one downside. Another is that the system's line-ending
conventions won't be translated to '\n' for you; you'll need to
interpret them yourself. Still another downside is that not all
input sources are mmap-able: Try it on your keyboard, for instance,
or on a pipe or socket.

</off-topic>

--
Eric Sosman
esosman@ieee-dot-org.invalid

James Harris 02-13-2011 09:14 PM

Re: getline problem
 
On Feb 13, 9:00*pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 2/13/2011 2:56 PM, James Harris wrote:
>
> > [...]
> > Incidentally, it occurred to me that another, more modern way to read
> > a line from a file may be to mmap the file and use memchr or similar
> > to scan it. Maybe the fastest option? As a slight downside, the code
> > might need to account for very large files that exceed address space.

>
> <off-topic reason="requires extensions">
>
> * * *That's one downside. *Another is that the system's line-ending
> conventions won't be translated to '\n' for you; you'll need to
> interpret them yourself.


True.

> *Still another downside is that not all
> input sources are mmap-able: Try it on your keyboard, for instance,
> or on a pipe or socket.


Sure. I did say "file".

James


All times are GMT. The time now is 05:04 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.