Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Reading long lines from a file

Reply
Thread Tools

Reading long lines from a file

 
 
Vlad Dogaru
Guest
Posts: n/a
 
      08-14-2007
Hello,

I suspect this comes up quite often, but I haven't found an exact
solution in the FAQ. I have to read and parse a file with arbitrarily
long lines and have come up with the following plan:

1. start with a statically allocated buffer and a pointer of equal size
2. read into the buffer using fgets and append to the pointer
3. if buffer does not contain '\n', reallocate buffer and jump to 2
4. return the pointer

Do you see anything wrong with this? If so, how can I improve it?

Thanks in advance,
Vlad Dogaru

--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
 
Reply With Quote
 
 
 
 
Richard Heathfield
Guest
Posts: n/a
 
      08-14-2007
Vlad Dogaru said:

> Hello,
>
> I suspect this comes up quite often, but I haven't found an exact
> solution in the FAQ. I have to read and parse a file with arbitrarily
> long lines and have come up with the following plan:
>
> 1. start with a statically allocated buffer and a pointer of equal
> size 2. read into the buffer using fgets and append to the pointer
> 3. if buffer does not contain '\n', reallocate buffer and jump to 2
> 4. return the pointer
>
> Do you see anything wrong with this? If so, how can I improve it?


To start with, you can't reallocate a statically allocated buffer! Nor
can you have a pointer of equal size to a buffer except by sizing the
buffer to be the same size as a pointer. Nor can you append to a
pointer.

Once we get those impossibilities out of the way, we can dispense with
the unnecessary fgets call - your input is already buffered, so why
buffer it again through fgets?

Here's the plan:

Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
at this allocation with P. Set U to 0. Have a temporary pointer T
kicking about the place.

While you can read a character successfully that isn't a newline:
If U == C - 1
You're about to run out of space, so get some more
T = realloc(P, C * 2)
If that didn't work, you might want to try lower multipliers
(1.5, 1.25 maybe) or even use add instead of multiply - and
warn the caller that you're running low on RAM.
Eventually, either you give up (in which case tell the user
you failed), or you succeed, in which case set P = T
Increase C to describe the new allocation amount accurately
Endif

If all is well
P[U++] = the character you read
Endif
Endwhile
If all is well
P[U] = '\0'
End if
P now contains the line.

For a discussion of long-line issues, an implementation of a full line
capture function, and links to other such implementations, see
http://www.cpax.org.uk/prg/writings/fgetdata.php

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 
Reply With Quote
 
 
 
 
pete
Guest
Posts: n/a
 
      08-14-2007
Vlad Dogaru wrote:
>
> Hello,
>
> I suspect this comes up quite often, but I haven't found an exact
> solution in the FAQ. I have to read and parse a file with arbitrarily
> long lines and have come up with the following plan:
>
> 1. start with a statically allocated buffer and a pointer of equal size
> 2. read into the buffer using fgets and append to the pointer
> 3. if buffer does not contain '\n', reallocate buffer and jump to 2
> 4. return the pointer
>
> Do you see anything wrong with this?


Possibly with the phrase "statically allocated".
There's three kinds of duration:
1 automatic
2 static
3 allocated

Only allocated memory can be reallocated.

> If so, how can I improve it?


A few of the regulars here
have written their own getline functions:
http://www.cpax.org.uk/prg/writings/...ta.php#related

--
pete
 
Reply With Quote
 
Vlad Dogaru
Guest
Posts: n/a
 
      08-14-2007
Richard Heathfield wrote:
> Vlad Dogaru said:
>
>> Hello,
>>
>> I suspect this comes up quite often, but I haven't found an exact
>> solution in the FAQ. I have to read and parse a file with arbitrarily
>> long lines and have come up with the following plan:
>>
>> 1. start with a statically allocated buffer and a pointer of equal
>> size 2. read into the buffer using fgets and append to the pointer
>> 3. if buffer does not contain '\n', reallocate buffer and jump to 2
>> 4. return the pointer
>>
>> Do you see anything wrong with this? If so, how can I improve it?

>
> To start with, you can't reallocate a statically allocated buffer! Nor
> can you have a pointer of equal size to a buffer except by sizing the
> buffer to be the same size as a pointer. Nor can you append to a
> pointer.
>
> Once we get those impossibilities out of the way, we can dispense with
> the unnecessary fgets call - your input is already buffered, so why
> buffer it again through fgets?



If anything, my lack of English skills has contributed to the
misunderstanding. I was talking about:
char b[100], *p;
Reading into b with fgets, then reallocating p as necessary to do a
strcat(p, b).

But your solution is much more elegant and now I see why fgets is
unnecessary.

>
> Here's the plan:
>
> Allocate C (greater than 1) bytes of storage space DYNAMICALLY - point
> at this allocation with P. Set U to 0. Have a temporary pointer T
> kicking about the place.
>
> While you can read a character successfully that isn't a newline:
> If U == C - 1
> You're about to run out of space, so get some more
> T = realloc(P, C * 2)
> If that didn't work, you might want to try lower multipliers
> (1.5, 1.25 maybe) or even use add instead of multiply - and
> warn the caller that you're running low on RAM.
> Eventually, either you give up (in which case tell the user
> you failed), or you succeed, in which case set P = T
> Increase C to describe the new allocation amount accurately
> Endif
>
> If all is well
> P[U++] = the character you read
> Endif
> Endwhile
> If all is well
> P[U] = '\0'
> End if
> P now contains the line.
>
> For a discussion of long-line issues, an implementation of a full line
> capture function, and links to other such implementations, see
> http://www.cpax.org.uk/prg/writings/fgetdata.php


Thank you for the clarification and the link. I will look into it and I
am confident that I can write a similar function.

Vlad
--
Number one reason to date an engineer:
The world does revolve around us; we pick the coordinate system.
 
Reply With Quote
 
David Mathog
Guest
Posts: n/a
 
      08-14-2007
Vlad Dogaru wrote:
> Hello,
>
> I suspect this comes up quite often, but I haven't found an exact
> solution in the FAQ. I have to read and parse a file with arbitrarily
> long lines and have come up with the following plan:
>
> 1. start with a statically allocated buffer and a pointer of equal size
> 2. read into the buffer using fgets and append to the pointer
> 3. if buffer does not contain '\n', reallocate buffer and jump to 2
> 4. return the pointer
>
> Do you see anything wrong with this? If so, how can I improve it?


This may not apply to your particular case, but in some instances I have
encountered with "arbitrarily long lines" one can just read a character
at a time, examine it, perform some action, and then continue. This
removes the need for a huge buffer, which in the worst case, might not
even fit into the computer's memory. Obviously this won't work if any
modification to the front of the line depends on a value near the end of
the line.

If you do go with the expanding buffer method be sure you that you do
NOT use strcat() to append each new chunk of text. Doing so will result
in each such addition scanning from the front of the buffer for the
terminal '\0' in the string. I've seen this bug many, many times.
It can cause a huge performance hit. Instead, keep track of the
length of the string in the buffer and just copy the new string directly
to the appropriate position, then adjust the length variable, and repeat.

Regards,

David Mathog

 
Reply With Quote
 
Flash Gordon
Guest
Posts: n/a
 
      08-14-2007
Vlad Dogaru wrote, On 14/08/07 11:46:
> Richard Heathfield wrote:


<snip>

>> To start with, you can't reallocate a statically allocated buffer! Nor
>> can you have a pointer of equal size to a buffer except by sizing the
>> buffer to be the same size as a pointer. Nor can you append to a pointer.
>>
>> Once we get those impossibilities out of the way, we can dispense with
>> the unnecessary fgets call - your input is already buffered, so why
>> buffer it again through fgets?

>
> If anything, my lack of English skills has contributed to the
> misunderstanding. I was talking about:
> char b[100], *p;
> Reading into b with fgets, then reallocating p as necessary to do a
> strcat(p, b).


Since we do not know what p points to we cannot say whether you are
allowed to realloc what it points to or not. You can only pass pointers
returned by malloc or realloc to realloc.

Also be ware of denial-of-service attacks where a user deliberately
creates a file with a line 5GB long.

<snip>
--
Flash Gordon
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      08-20-2007
On 2007-08-14 17:43, Flash Gordon <(E-Mail Removed)> wrote:
> Vlad Dogaru wrote, On 14/08/07 11:46:
>> Richard Heathfield wrote:
>>> To start with, you can't reallocate a statically allocated buffer! Nor
>>> can you have a pointer of equal size to a buffer except by sizing the
>>> buffer to be the same size as a pointer. Nor can you append to a pointer.

[...]
>> If anything, my lack of English skills has contributed to the
>> misunderstanding. I was talking about:
>> char b[100], *p;
>> Reading into b with fgets, then reallocating p as necessary to do a
>> strcat(p, b).

>
> Since we do not know what p points to we cannot say whether you are
> allowed to realloc what it points to or not.


We cannot *know*, but I think it is reasonable to assume from the
description to assume that he uses malloc to get the initial value for
p. You don't always have to assume the stupidest possible version if
something isn't specified exactly .

> Also be ware of denial-of-service attacks where a user deliberately
> creates a file with a line 5GB long.


ACK. But that's probably not something which should be hard-coded into
the application. After all, the program might run on a machine with 64
GB RAM where 5 GB of memory usage is quite acceptable. You could use a
configurable limit or rely on OS features to limit memory consumption
(e.g. ulimit on unixoid systems).

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
 
Reply With Quote
 
Spiros Bousbouras
Guest
Posts: n/a
 
      08-20-2007
On Aug 20, 1:57 pm, "Peter J. Holzer" <(E-Mail Removed)> wrote:
> On 2007-08-14 17:43, Flash Gordon <(E-Mail Removed)> wrote:
>
> > Vlad Dogaru wrote, On 14/08/07 11:46:
> >> Richard Heathfield wrote:
> >>> To start with, you can't reallocate a statically allocated buffer! Nor
> >>> can you have a pointer of equal size to a buffer except by sizing the
> >>> buffer to be the same size as a pointer. Nor can you append to a pointer.

> [...]
> >> If anything, my lack of English skills has contributed to the
> >> misunderstanding. I was talking about:
> >> char b[100], *p;
> >> Reading into b with fgets, then reallocating p as necessary to do a
> >> strcat(p, b).

>
> > Since we do not know what p points to we cannot say whether you are
> > allowed to realloc what it points to or not.

>
> We cannot *know*, but I think it is reasonable to assume from the
> description to assume that he uses malloc to get the initial value for
> p. You don't always have to assume the stupidest possible version if
> something isn't specified exactly .


Reading Flash Gordon's post I don't see him assuming anything.
He was simply aiming to cover all possibilities and I'm all for
that ; we do aim to be accurate around here.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reading text-file with very long lines McGregor Java 2 01-29-2009 07:07 PM
Having compilation error: no match for call to (const __gnu_cxx::hash<long long int>) (const long long int&) veryhotsausage C++ 1 07-04-2008 05:41 PM
Reading long lines doesn't work in Python Scott Simpson Python 2 07-19-2006 05:15 PM
Re: how to read 10 lines from a 200 lines file and write to a new file?? Joe Wright C Programming 0 07-27-2003 08:50 PM
Assigning unsigned long to unsigned long long George Marsaglia C Programming 1 07-08-2003 05:16 PM



Advertisments