Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Analogue of ReadLine() in C

Reply
Thread Tools

Analogue of ReadLine() in C

 
 
BartC
Guest
Posts: n/a
 
      10-20-2011
"China Blue Corn Chips" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)-september.org...
> In article <(E-Mail Removed)>,
> Seebs <(E-Mail Removed)> wrote:
>
>> On 2011-10-20, China Blue Corn Chips <(E-Mail Removed)> wrote:


>> > Buy a bigger swap disk.

>>
>> This is crappy advice, because it's completely unrelated. There are

>
> The maximum line size is the file size. If the VM that can map in the
> entire
> file, it can allocate space for one line.


Someone doing 'ReadLine' is probably interested in line-oriented files.

A file containing with a single multi-GB line is likely not line-oriented!
Or is not using the expected newline sequence.

It would be acceptable then to put a cap on the longest length of one line.
The user then needs to use a function such as 'Readfile' instead.

--
Bartc


 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      10-20-2011
On Oct 20, 3:31*pm, "BartC" <(E-Mail Removed)> wrote:
>
> Someone doing 'ReadLine' is probably interested in line-oriented files.
>
> A file containing with a single multi-GB line is likely not line-oriented!
> Or is not using the expected newline sequence.
>

The problem is that it is impossible to define a reasonable line
length.

For instance a program I've just written generates csv files. Each
line represents a "case', each column a "factor". The factors are DNA
sequences which may or may not be present in the genes under
consideration, the cases are the genes. If I look at 5-letter DNA
sequences, that's 4^5 or 1024 factors. Each one has a real value
associated with it, so about five characters for three decimal places
of precision. That means that lines are weighing in at about 6000
characters.


--
Fuzzy logic trees (build a tree from factors and cases, as described).
http://www.malcolmmclean.site11.com/www
 
Reply With Quote
 
 
 
 
Kenny McCormack
Guest
Posts: n/a
 
      10-20-2011
In article <(E-Mail Removed)>,
Malcolm McLean <(E-Mail Removed)> wrote:
....
>sequences, that's 4^5 or 1024 factors. Each one has a real value
>associated with it, so about five characters for three decimal places
>of precision. That means that lines are weighing in at about 6000
>characters.


6000 is nothing. 10K is nothing. Even 1M is really nothing on today's
hardware.

This is one of those issues where in order to make it sound like a serious
problem, people have to drag it out to an extreme (talking about 1G).

--

First of all, I do not appreciate your playing stupid here at all.

- Thomas 'PointedEars' Lahn -

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-20-2011
Malcolm McLean <(E-Mail Removed)> writes:

> On Oct 20, 3:31*pm, "BartC" <(E-Mail Removed)> wrote:
>>
>> Someone doing 'ReadLine' is probably interested in line-oriented files.
>>
>> A file containing with a single multi-GB line is likely not line-oriented!
>> Or is not using the expected newline sequence.
>>

> The problem is that it is impossible to define a reasonable line
> length.
>
> For instance a program I've just written generates csv files. Each
> line represents a "case', each column a "factor". The factors are DNA
> sequences which may or may not be present in the genes under
> consideration, the cases are the genes. If I look at 5-letter DNA
> sequences, that's 4^5 or 1024 factors. Each one has a real value
> associated with it, so about five characters for three decimal places
> of precision. That means that lines are weighing in at about 6000
> characters.


You say it's impossible, yet you've managed to do it! Maybe you meant
that there can't be a limit imposed by the function. I agree that this
is more of a problem, but I don't see why there is any need to do that.
I'd want a general-purpose read_line function to have a maximum size
parameter, and it should probably allow the buffer to be re-used since
that is such a common pattern in line-oriented programs.

--
Ben.
 
Reply With Quote
 
Markus Wichmann
Guest
Posts: n/a
 
      10-20-2011
On 19.10.2011 21:26, Максим Фомин wrote:
> I need function similar to ReadLine() in any other language.
> The program below fails[1] when data more then LAG_SIZE is passed
>


Reading linewise through a file is as simple as using fgets(). If the
last character in the returned string is newline character, you got a
whole line.

This makes it possible to adapt to use cases. In your case, I would
simply do something like:

#include <stdio.h>
#include <string.h>
static char buffer[256];

static void loop()
{
do {
if (fgets(buffer, sizeof buffer, stdin) < 0) {
/* read error. Might be eof or real error. */
if (ferror(stdin)) perror("read error");
return;
}
} while (strncmp(buffer, "exit", sizeof "exit" - 1));
}

int main() { loop(); }

No dynamic allocation where it isn't necessary. It is not really
necessary in I/O, because there is never a buffer big enough for that,
so don't program as though there were!

HTH,
Markus
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-20-2011
Markus Wichmann <(E-Mail Removed)> writes:

> On 19.10.2011 21:26, Максим Фомин wrote:
>> I need function similar to ReadLine() in any other language.
>> The program below fails[1] when data more then LAG_SIZE is passed
>>

>
> Reading linewise through a file is as simple as using fgets(). If the
> last character in the returned string is newline character, you got a
> whole line.
>
> This makes it possible to adapt to use cases. In your case, I would
> simply do something like:
>
> #include <stdio.h>
> #include <string.h>
> static char buffer[256];
>
> static void loop()
> {
> do {
> if (fgets(buffer, sizeof buffer, stdin) < 0) {


I think you intended to write == 0 here.

> /* read error. Might be eof or real error. */
> if (ferror(stdin)) perror("read error");
> return;
> }
> } while (strncmp(buffer, "exit", sizeof "exit" - 1));
> }
>
> int main() { loop(); }


I am not sure how you decided that this suits the OP's use case. I
think the original posted code was just a cut down example, so it's not
obvious that silently splitting lines that are longer than 255
characters suits the OP.

> No dynamic allocation where it isn't necessary. It is not really
> necessary in I/O, because there is never a buffer big enough for that,
> so don't program as though there were!


I think you'd be annoyed if, say, awk behaved like this. Given a big
enough buffer, your shell might get away with it, but I expect it
doesn't try.

--
Ben.
 
Reply With Quote
 
Seebs
Guest
Posts: n/a
 
      10-20-2011
On 2011-10-20, China Blue Corn Chips <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed)>,
> Seebs <(E-Mail Removed)> wrote:

[all in reference to:]
>> >> >char *p = 0; int m = 0, n = 0, c;
>> >> >while (c=fgetc(file), c!='\n' && c!=EOF) {
>> >> > if (n>=m) {m = 2*n+1; p = realloc(p, m);}
>> >> > p[n++] = c;


>> >> What if !realloc(p,m)?


>> > Buy a bigger swap disk.


>> This is crappy advice, because it's completely unrelated. There are


> The maximum line size is the file size. If the VM that can map in the entire
> file, it can allocate space for one line.


Except, as I pointed out, this isn't true.

> If you want to run on smaller
> memories, adapt the code accordingly. I'm not interested in doing so.


That's fine. But telling someone *else* to "buy a bigger swap disk" as
a solution to a problem that cannot be fixed by buying a bigger swap disk
remains crappy advice.

> The total time and space is O(n). Changing the multiplier leaves you with O(n).
> Starting with size>0 reduces the time and space by O(k), however
> O(n)-O(k) = O(n).


Except that:
1. The thing it optimizes is actually O(log(n)).
2. k is, in this case, quite likely to be the entirety of the actual data.

> You're worrying about optimising at the wrong level which is a
> waste of time unless you've done measurement showing this is a hot loop.


There's such a thing as premature optimization, but that doesn't mean we
should go out of our way to do things in obviously stupid ways. On a real
system, with real data, your solution will take many times more real
execution time than one which changes nothing except starting with n=64.

And while it's a bad thing to go around making things fancier than you need
to for no measured benefit, it's not a bad thing to make reasonable choices
for starting values.

> Also
> realloc(0,n) is the same as malloc(n) so one initial malloc is no different than
> a realloc on the first loop; but it requires more code.


Right. I didn't suggest a fancier algorithm; I suggested starting n at some
other value.

> I used to write code that had to fit in 65K bytes. I don't any more. You can if
> you want.


This is a ridiculous bit of hyperbole. I'm not talking about crazy or
unreasonable requirements; I'm talking about something which, for data sizes
most people deal with on a daily basis, would blow up on most consumer
machines today, when an even slightly cleverer algorithm wouldn't.

>> Depending, I'd say either start with a "likely" length or start with
>> something around a page size, and reallocate up a little more conservatively.


> If the resize doesn't increase geometrically, the run time switches to O(n^2).
> Otherwise it is O(n) + whatever code complexity you want to pretend you can beat
> O(n).


O(n) notation only gets you so far. There is more to writing good code than
that.

I guess what offends me about this isn't that it's lazy, but that it's not
even lazy. It wouldn't be any more effort to do this in a way that would
work dramatically better. If it were *any* more effort, I could agree that
maybe it wasn't worth putting in that effort, but when it's *no effort at
all*, that's ridiculous.

-s
--
Copyright 2011, all wrongs reversed. Peter Seebach / http://www.velocityreviews.com/forums/(E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      10-20-2011
On 10/21/11 02:31 AM, BartC wrote:
> "China Blue Corn Chips"<(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)-september.org...
>> In article<(E-Mail Removed)>,
>> Seebs<(E-Mail Removed)> wrote:
>>
>>> On 2011-10-20, China Blue Corn Chips<(E-Mail Removed)> wrote:

>
>>>> Buy a bigger swap disk.
>>>
>>> This is crappy advice, because it's completely unrelated. There are

>>
>> The maximum line size is the file size. If the VM that can map in the
>> entire
>> file, it can allocate space for one line.

>
> Someone doing 'ReadLine' is probably interested in line-oriented files.
>
> A file containing with a single multi-GB line is likely not line-oriented!
> Or is not using the expected newline sequence.
>
> It would be acceptable then to put a cap on the longest length of one line.
> The user then needs to use a function such as 'Readfile' instead.


Or consume the "line" bit by bit.

--
Ian Collins
 
Reply With Quote
 
Michael Angelo Ravera
Guest
Posts: n/a
 
      10-20-2011
You will see some petty squabling.

The reason that you code fails is that you can't generally realloc what youhaven't [m|c|0]alloc'd.

I agree with others that you should first allocate something like the longest common line (whatever you deem that to be). I might pick 256 or 512 or 409 or 86 or 133. If wastage for multiple consecutive lines is a big problem, you will want to do some sort of your own memory management for your "Lines".

You could allocate something extremely large and then copy it into a bufferthat is just barely big enough.

Most libraries do not allocate exactly what you tell them to. If you know about the memory management method, you can allocate something reasonable initially and avoid doing the copy when you know (or believe) that it won't really give back any of the memory.

So, if your libraries always allocate in multiples of 16, you could make your initial allocation be something like 48 or 80 or 128.

It would, of course, be nice to have your function learn how big your lineswere likely to be by keeping internal statistics or by an argument as a hint.

Ironically, any function that returns an allocated buffer will likely be least efficient in space use with small lines and least efficient in computational speed with large ones. If you don't care about running out of memory at 3% of your nominal capacity when you are processing a file with a bunch of short lines nor in taking 4 times as long as nominal when you have a very few very large lines the one-size-fits-all approach for a library would work fine.

If you can't make your function smart, you can create two or three differntvariations so as to optimize whatever metric you'd like. Having one function that reads chat room squawks (which are typically less than 32 characters), another that reads C code (which have an average around 40 or 50 characters and a max that is defined), and another that reads paragraphs from a novel (where dialogue paragraphs might be short, but descriptions [especially in Russian] can go on for several pages) with different optimization criteria would seem like a good idea.

For the ReadNovelLine function, it would seem reasonable to have an I/O argument that indicated what kind of a thing was expected and what you actually got. If you wanted to get cute, you could even have a secret circular array of the last so many line lengths. Do you want to do it or to do it inteligently?
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      10-20-2011
On 10/20/2011 04:32 PM, Michael Angelo Ravera wrote:
....
> The reason that you code fails is that you can't generally realloc
> what you haven't [m|c|0]alloc'd.


What's the 0 for? 0alloc'd?

You are also allowed to realloc() a null pointer, or a pointer returned
by a previous successful call to realloc() that has not yet been free()d
or successfully realloc()d again. As far as I can see, those are the
only two things this code ever realloc()s. Have I missed something?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Trying to get a 3640 to work as dial-in ISDN/analogue accessserver Guy Dawson Cisco 1 04-17-2007 10:08 AM
Analogue V DVI display quality xxxxxxxxxx Computer Support 2 12-05-2004 06:25 PM
Analogue or digital camcorder? Kenny Computer Support 1 11-24-2004 09:17 PM
Thrustmaster Dual Analogue JayneB Computer Support 0 03-04-2004 06:48 PM
as5200 and incoming analogue calls problem John Gelavis Cisco 0 11-27-2003 01:50 AM



Advertisments