Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > complexity for tellg()

Reply
Thread Tools

complexity for tellg()

 
 
toton
Guest
Posts: n/a
 
      02-20-2007
Hi,
I am reading a big file , and need to have a flag for current file
position so that I can store the positions for later direct access.
However it looks tellg is a very costly function ! But it's code says
it should just return the current buffer position , thus should be a
very low cost function.
To explain,
{
boost:rogress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer, while if I uncomment the line
in.tellg() , it takes 120.8 sec (varies a little )

can anyone say the reason & the possible workout ?
I amusing MS Visual Studio 7.1 and the std library provided by visual
studio 7.1

 
Reply With Quote
 
 
 
 
Alf P. Steinbach
Guest
Posts: n/a
 
      02-20-2007
* toton:
> Hi,
> I am reading a big file , and need to have a flag for current file
> position so that I can store the positions for later direct access.
> However it looks tellg is a very costly function ! But it's code says
> it should just return the current buffer position , thus should be a
> very low cost function.
> To explain,
> {
> boost:rogress_timer t;
> std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
> std::string line;
> while(in){
> int pos = in.tellg();
> std::getline(in,line);
> }
> }
> This code takes 0.58 sec in my computer, while if I uncomment the line
> in.tellg() , it takes 120.8 sec (varies a little )
>
> can anyone say the reason & the possible workout ?
> I amusing MS Visual Studio 7.1 and the std library provided by visual
> studio 7.1


Most likely the cause is conversion of CRLF to LF, which you've
specified by (1) opening the file in text mode and (2) compiling with a
Windows compiler.

One cure could then be to open the file in binary mode, and handle
newlines as appropriate (or not).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
 
Reply With Quote
 
 
 
 
John Harrison
Guest
Posts: n/a
 
      02-20-2007
toton wrote:
> Hi,
> I am reading a big file , and need to have a flag for current file
> position so that I can store the positions for later direct access.
> However it looks tellg is a very costly function ! But it's code says
> it should just return the current buffer position , thus should be a
> very low cost function.
> To explain,
> {
> boost:rogress_timer t;
> std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
> std::string line;
> while(in){
> int pos = in.tellg();
> std::getline(in,line);
> }
> }
> This code takes 0.58 sec in my computer, while if I uncomment the line
> in.tellg() , it takes 120.8 sec (varies a little )
>
> can anyone say the reason & the possible workout ?
> I amusing MS Visual Studio 7.1 and the std library provided by visual
> studio 7.1
>


The reason is that tellg performs a seek to the current position. This
flushes the input buffer so dramatically slowing down your program.

Looks as through the defintion is streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.

john
 
Reply With Quote
 
John Harrison
Guest
Posts: n/a
 
      02-20-2007
>
> Looks as through the defintion is streambuf (which is used by all
> streams) is such that the only way to find the current position is to
> perform a seek to the current position.
>


Let me try that again

Looks as though the definition of streambuf (which is used by all
streams) is such that the only way to find the current position is to
perform a seek to the current position.

john
 
Reply With Quote
 
Carlo Capelli
Guest
Posts: n/a
 
      02-20-2007
Of course you can approach the problem computing the position yourself, if
you know the size of the input read.
Not elegant, but it works for simple cases...

std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
size_t pos = 0;
std::string line;
while(in){
// int pos = in.tellg();
std::getline(in,line);
pos += line.length() + 2; // account for line terminator...
}

Bye Carlo

"toton" <(E-Mail Removed)> ha scritto nel messaggio
news:(E-Mail Removed) ups.com...
> Hi,
> I am reading a big file , and need to have a flag for current file
> position so that I can store the positions for later direct access.
> However it looks tellg is a very costly function ! But it's code says
> it should just return the current buffer position , thus should be a
> very low cost function.
> To explain,
> {
> boost:rogress_timer t;
> std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
> std::string line;
> while(in){
> int pos = in.tellg();
> std::getline(in,line);
> }
> }
> This code takes 0.58 sec in my computer, while if I uncomment the line
> in.tellg() , it takes 120.8 sec (varies a little )
>
> can anyone say the reason & the possible workout ?
> I amusing MS Visual Studio 7.1 and the std library provided by visual
> studio 7.1
>



 
Reply With Quote
 
John Harrison
Guest
Posts: n/a
 
      02-20-2007
John Harrison wrote:
>>
>> Looks as through the defintion is streambuf (which is used by all
>> streams) is such that the only way to find the current position is to
>> perform a seek to the current position.
>>

>
> Let me try that again
>
> Looks as though the definition of streambuf (which is used by all
> streams) is such that the only way to find the current position is to
> perform a seek to the current position.
>
> john


Let me really try this again, I shouldn't speculate on things I have no
real knowledge of.

I would imagine that the *likely* reason is that calling tellg in the
particular circumstances you are is causing the input buffer to flush.
Certainly the slow down you are observing would be consistent with that.

However the only way to know for sure would be a careful examination of
the library code, or use of a debugger to step into the library code.

john
 
Reply With Quote
 
toton
Guest
Posts: n/a
 
      02-20-2007
On Feb 20, 11:46 am, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> * toton:
>
>
>
> > Hi,
> > I am reading a big file , and need to have a flag for current file
> > position so that I can store the positions for later direct access.
> > However it looks tellg is a very costly function ! But it's code says
> > it should just return the current buffer position , thus should be a
> > very low cost function.
> > To explain,
> > {
> > boost:rogress_timer t;
> > std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
> > std::string line;
> > while(in){
> > int pos = in.tellg();
> > std::getline(in,line);
> > }
> > }
> > This code takes 0.58 sec in my computer, while if I uncomment the line
> > in.tellg() , it takes 120.8 sec (varies a little )

>
> > can anyone say the reason & the possible workout ?
> > I amusing MS Visual Studio 7.1 and the std library provided by visual
> > studio 7.1

>
> Most likely the cause is conversion of CRLF to LF, which you've
> specified by (1) opening the file in text mode and (2) compiling with a
> Windows compiler.
>
> One cure could then be to open the file in binary mode, and handle
> newlines as appropriate (or not).
>
> --
> A: Because it messes up the order in which people normally read text.
> Q: Why is it such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing on usenet and in e-mail?


There are enough bad things related to new line ...
seekg and tellg doesn't match when newline char is \n , and file is
opened in text mode.
For the unix file,
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
std::cout<<pos<<" "<<line<<std::endl;

if(line==".PEN_DOWN"){
in.seekg(pos);
break;
}
}
std::getline(in,line);///This doesn't print .PEN_DOWN !
std::cout<<line<<std::endl;
Now if I open it in binary mode, Then this problem is solved.
But it creates another set of problems,
for unix file now it is fine, but for windows file \r is attached at
the end of line, as newline char is \n. So I need to remove \r from
the line if it is present.

I wonder, what will getline will return in case of a mac file where
newline terminator is \r only. Will it return the total file as single
line ?
Is there any std api support to take care of all these things, and yet
to make seekg & tellg consistent ?

Thanks
abir

 
Reply With Quote
 
P.J. Plauger
Guest
Posts: n/a
 
      02-20-2007
"toton" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...

> On Feb 20, 11:46 am, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
>> * toton:
>>
>>
>>
>> > Hi,
>> > I am reading a big file , and need to have a flag for current file
>> > position so that I can store the positions for later direct access.
>> > However it looks tellg is a very costly function ! But it's code says
>> > it should just return the current buffer position , thus should be a
>> > very low cost function.
>> > To explain,
>> > {
>> > boost:rogress_timer t;
>> > std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
>> > std::string line;
>> > while(in){
>> > int pos = in.tellg();
>> > std::getline(in,line);
>> > }
>> > }
>> > This code takes 0.58 sec in my computer, while if I uncomment the line
>> > in.tellg() , it takes 120.8 sec (varies a little )

>>
>> > can anyone say the reason & the possible workout ?
>> > I amusing MS Visual Studio 7.1 and the std library provided by visual
>> > studio 7.1

>>
>> Most likely the cause is conversion of CRLF to LF, which you've
>> specified by (1) opening the file in text mode and (2) compiling with a
>> Windows compiler.
>>
>> One cure could then be to open the file in binary mode, and handle
>> newlines as appropriate (or not).
>>
>> --
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is it such a bad thing?
>> A: Top-posting.
>> Q: What is the most annoying thing on usenet and in e-mail?

>
> There are enough bad things related to new line ...
> seekg and tellg doesn't match when newline char is \n , and file is
> opened in text mode.


That shouldn't be, if you're just using seekg to return to a place
earlier memorized by tellg.

> For the unix file,
> std::string line;
> while(in){
> int pos = in.tellg();
> std::getline(in,line);
> std::cout<<pos<<" "<<line<<std::endl;
>
> if(line==".PEN_DOWN"){
> in.seekg(pos);
> break;
> }
> }
> std::getline(in,line);///This doesn't print .PEN_DOWN !
> std::cout<<line<<std::endl;
> Now if I open it in binary mode, Then this problem is solved.
> But it creates another set of problems,
> for unix file now it is fine, but for windows file \r is attached at
> the end of line, as newline char is \n. So I need to remove \r from
> the line if it is present.


If you wrote the file in binary mode, the \r characters wouldn't
be appended in the first place. It is important that you read and
write consistently, at least if you don't want to deal with local
conventions for reading and writing text files.

> I wonder, what will getline will return in case of a mac file where
> newline terminator is \r only. Will it return the total file as single
> line ?


If you write in text mode and read in binary mode, that could happen,
yes.

> Is there any std api support to take care of all these things, and yet
> to make seekg & tellg consistent ?


Yes, it's called the Standard C++ library, if you use it right.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


 
Reply With Quote
 
Pete Becker
Guest
Posts: n/a
 
      02-20-2007
toton wrote:
>
> There are enough bad things related to new line ...
> seekg and tellg doesn't match when newline char is \n , and file is
> opened in text mode.


Sure it does. See below.

> For the unix file,
> std::string line;
> while(in){
> int pos = in.tellg();
> std::getline(in,line);
> std::cout<<pos<<" "<<line<<std::endl;
>
> if(line==".PEN_DOWN"){
> in.seekg(pos);
> break;
> }
> }
> std::getline(in,line);///This doesn't print .PEN_DOWN !
> std::cout<<line<<std::endl;
> Now if I open it in binary mode, Then this problem is solved.
> But it creates another set of problems,
> for unix file now it is fine, but for windows file \r is attached at
> the end of line, as newline char is \n. So I need to remove \r from
> the line if it is present.
>
> I wonder, what will getline will return in case of a mac file where
> newline terminator is \r only. Will it return the total file as single
> line ?
> Is there any std api support to take care of all these things, and yet
> to make seekg & tellg consistent ?
>


Be careful: you're mixing two different things. In C++ source code, '\n'
is the character that's used to mark the end of a line, and '\r' is the
character that is used to mark a carriage return. That has only a
historical connection with the ASCII newline character whose value is
0x0D and the ASCII carriage return character whose value is 0x0A (or
maybe the other way around).

For text files, if you know the conventions that your operating system
uses, you can talk about the details of how line ends are represented in
the text file. But from a high level language perspective, that's
irrelevant detail: it's up to the I/O library to translate things, so
that when you write the character '\n' it does whatever is appropriate
to mark the end of a line using the OS's conventions. Similarly, when
you read a text file, the I/O library translates whatever the OS uses to
mark the end of a line into a single '\n' character.

The problem you're running into is that you're apparently not using
native text files, since you're talking about unix files, mac files, and
Windows. The I/O library isn't prepared to deal with all of them. When
you move text files from one system to another, use a utility like ftp
that understands line ending conventions and does the appropriate
translations. Don't expect Unix I/O libraries to understand Windows file
conventions, or vice versa.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
Reply With Quote
 
toton
Guest
Posts: n/a
 
      02-20-2007
On Feb 20, 7:23 pm, "P.J. Plauger" <(E-Mail Removed)> wrote:
> "toton" <(E-Mail Removed)> wrote in message
>
> news:(E-Mail Removed) ups.com...
>
>
>
> > On Feb 20, 11:46 am, "Alf P. Steinbach" <(E-Mail Removed)> wrote:
> >> * toton:

>
> >> > Hi,
> >> > I am reading a big file , and need to have a flag for current file
> >> > position so that I can store the positions for later direct access.
> >> > However it looks tellg is a very costly function ! But it's code says
> >> > it should just return the current buffer position , thus should be a
> >> > very low cost function.
> >> > To explain,
> >> > {
> >> > boost:rogress_timer t;
> >> > std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
> >> > std::string line;
> >> > while(in){
> >> > int pos = in.tellg();
> >> > std::getline(in,line);
> >> > }
> >> > }
> >> > This code takes 0.58 sec in my computer, while if I uncomment the line
> >> > in.tellg() , it takes 120.8 sec (varies a little )

>
> >> > can anyone say the reason & the possible workout ?
> >> > I amusing MS Visual Studio 7.1 and the std library provided by visual
> >> > studio 7.1

>
> >> Most likely the cause is conversion of CRLF to LF, which you've
> >> specified by (1) opening the file in text mode and (2) compiling with a
> >> Windows compiler.

>
> >> One cure could then be to open the file in binary mode, and handle
> >> newlines as appropriate (or not).

>
> >> --
> >> A: Because it messes up the order in which people normally read text.
> >> Q: Why is it such a bad thing?
> >> A: Top-posting.
> >> Q: What is the most annoying thing on usenet and in e-mail?

>
> > There are enough bad things related to new line ...
> > seekg and tellg doesn't match when newline char is \n , and file is
> > opened in text mode.

>
> That shouldn't be, if you're just using seekg to return to a place
> earlier memorized by tellg.
>
>
>
> > For the unix file,
> > std::string line;
> > while(in){
> > int pos = in.tellg();
> > std::getline(in,line);
> > std::cout<<pos<<" "<<line<<std::endl;

>
> > if(line==".PEN_DOWN"){
> > in.seekg(pos);
> > break;
> > }
> > }
> > std::getline(in,line);///This doesn't print .PEN_DOWN !
> > std::cout<<line<<std::endl;
> > Now if I open it in binary mode, Then this problem is solved.
> > But it creates another set of problems,
> > for unix file now it is fine, but for windows file \r is attached at
> > the end of line, as newline char is \n. So I need to remove \r from
> > the line if it is present.

>
> If you wrote the file in binary mode, the \r characters wouldn't
> be appended in the first place. It is important that you read and
> write consistently, at least if you don't want to deal with local
> conventions for reading and writing text files.
>
> > I wonder, what will getline will return in case of a mac file where
> > newline terminator is \r only. Will it return the total file as single
> > line ?

>
> If you write in text mode and read in binary mode, that could happen,
> yes.
>
> > Is there any std api support to take care of all these things, and yet
> > to make seekg & tellg consistent ?

>
> Yes, it's called the Standard C++ library, if you use it right.
>
> P.J. Plauger
> Dinkumware, Ltd.http://www.dinkumware.com


May be I am unable to express the problem clearly.
1) I am not writing the file, I am reading the file only. It is a text
file, but nothing is fixed like line terminator will be \n or \r\n or
\r . It all depends on who saved the file using which editor .
So this is the question for parsing ...
The file looks something like this
..X_DIM 20701
..Y_DIM 27000
..X_POINTS_PER_MM 100
..Y_POINTS_PER_MM 100
..POINTS_PER_SECOND 200
..COMMENT YES_PRES_ORG 0
..COMMENT YES_PRES_EXT 1023
..DT 3975234
..PEN_DOWN
..COMMENT .PEN_WIDTH 1
..COMMENT .PEN_WIDTH_ORG 1
..COMMENT .PEN_COLOR 0x0

Now I need to remember past position using tellg() , and go to that
position using seekg().
The cases are,
1) file is opened in text mode . The file contains \n as terminator.
seekg doesn't place file pointer to proper pos saved by tellg (as
given in my previous program ) . It works as expected when newline is
\r\n.
2) The file is opened in binary mode . The file contains \n as line
terminator.
seekg & tellg works as expected. The file contains \r\n as
terminator . the returned string contains \r , which need to be
removed.
3) This one I hadn't tested. Several mac files have \r as newline
char. What std::getline(stream,str ) will return ? The whole page or
the line only ?

Thus my questions are, how to check which newline char to use , so
that I can parse all of the files properly ?
It should be noted, files are not written by me, I just read it.
And all the test's are done with MSVC 7.1 , gcc might give just
opposite result (I will check it quickly ) .

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Opinions on complexity _Hobbes Java 21 12-17-2005 01:49 AM
How much complexity to put in POJOs? Scott Balmos Java 6 07-13-2005 04:14 AM
2.0 Controlling password complexity in Membership =?Utf-8?B?TW9yZ2FuIFJvZGVyaWNr?= ASP .Net 3 04-22-2005 12:23 AM
NYJavaSIG - EJB 3.0 and J2EE 5 - Bringing Simplicity to the Complexity Frank D. Greco Java 0 02-15-2005 05:31 AM
reducing complexity foldface@yahoo.co.uk ASP .Net 0 10-12-2004 01:05 PM



Advertisments