Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > complexity for tellg()

Reply
Thread Tools

complexity for tellg()

 
 
P.J. Plauger
Guest
Posts: n/a
 
      02-20-2007
"toton" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...

> .....
> May be I am unable to express the problem clearly.
> 1) I am not writing the file, I am reading the file only. It is a text
> file, but nothing is fixed like line terminator will be \n or \r\n or
> \r . It all depends on who saved the file using which editor .


Then it *is* fixed, but not by you. If, as Pete Becker said, the
file was written as text on one system and read on another, the
lines might not be terminated as the reading system expects. And if
you read the file as binary, you have to know what line terminators
look like.

> So this is the question for parsing ...
> The file looks something like this
> .X_DIM 20701
> .Y_DIM 27000
> .X_POINTS_PER_MM 100
> .Y_POINTS_PER_MM 100
> .POINTS_PER_SECOND 200
> .COMMENT YES_PRES_ORG 0
> .COMMENT YES_PRES_EXT 1023
> .DT 3975234
> .PEN_DOWN
> .COMMENT .PEN_WIDTH 1
> .COMMENT .PEN_WIDTH_ORG 1
> .COMMENT .PEN_COLOR 0x0
>
> Now I need to remember past position using tellg() , and go to that
> position using seekg().
> The cases are,
> 1) file is opened in text mode . The file contains \n as terminator.
> seekg doesn't place file pointer to proper pos saved by tellg (as
> given in my previous program ) . It works as expected when newline is
> \r\n.


You're violating the Windows notion of text file, so it's possible
you're confusing the underlying C library, which the Standard C++
uses for basic file operations. Convert the file to Windows format
and seekg/tellg should work fine.

> 2) The file is opened in binary mode . The file contains \n as line
> terminator.
> seekg & tellg works as expected.


Right. No surprise.

> The file contains \r\n as
> terminator . the returned string contains \r , which need to be
> removed.


Yep. You're now violating the C/C++ conventions for text streams
internal to a program. The \r is considered part of the text line,
not part of the line terminator.

> 3) This one I hadn't tested. Several mac files have \r as newline
> char. What std::getline(stream,str ) will return ? The whole page or
> the line only ?


The whole works, unless you specify \r as the line terminator.

> Thus my questions are, how to check which newline char to use , so
> that I can parse all of the files properly ?


Well, you have to know what they are, don't you? Or at least all the
possible options. One approach is to read the file as binary and be
prepared for any of \n, \n\r, \r, or \r\n as line terminators. It's
kinda hard to use getline directly that way, but you can write your
own version.

> It should be noted, files are not written by me, I just read it.
> And all the test's are done with MSVC 7.1 , gcc might give just
> opposite result (I will check it quickly ) .


I think "opposite" is an over simplification. It's just different.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


 
Reply With Quote
 
 
 
 
toton
Guest
Posts: n/a
 
      02-20-2007
On Feb 20, 7:31 pm, Pete Becker <(E-Mail Removed)> wrote:
> toton wrote:
>
> > There are enough bad things related to new line ...
> > seekg and tellg doesn't match when newline char is \n , and file is
> > opened in text mode.

>
> Sure it does. See below.
>
>
>
> > For the unix file,
> > std::string line;
> > while(in){
> > int pos = in.tellg();
> > std::getline(in,line);
> > std::cout<<pos<<" "<<line<<std::endl;

>
> > if(line==".PEN_DOWN"){
> > in.seekg(pos);
> > break;
> > }
> > }
> > std::getline(in,line);///This doesn't print .PEN_DOWN !
> > std::cout<<line<<std::endl;
> > Now if I open it in binary mode, Then this problem is solved.
> > But it creates another set of problems,
> > for unix file now it is fine, but for windows file \r is attached at
> > the end of line, as newline char is \n. So I need to remove \r from
> > the line if it is present.

>
> > I wonder, what will getline will return in case of a mac file where
> > newline terminator is \r only. Will it return the total file as single
> > line ?
> > Is there any std api support to take care of all these things, and yet
> > to make seekg & tellg consistent ?

>
> Be careful: you're mixing two different things. In C++ source code, '\n'
> is the character that's used to mark the end of a line, and '\r' is the
> character that is used to mark a carriage return. That has only a
> historical connection with the ASCII newline character whose value is
> 0x0D and the ASCII carriage return character whose value is 0x0A (or
> maybe the other way around).
>
> For text files, if you know the conventions that your operating system
> uses, you can talk about the details of how line ends are represented in
> the text file. But from a high level language perspective, that's
> irrelevant detail: it's up to the I/O library to translate things, so
> that when you write the character '\n' it does whatever is appropriate
> to mark the end of a line using the OS's conventions. Similarly, when
> you read a text file, the I/O library translates whatever the OS uses to
> mark the end of a line into a single '\n' character.
>
> The problem you're running into is that you're apparently not using
> native text files, since you're talking about unix files, mac files, and
> Windows. The I/O library isn't prepared to deal with all of them. When
> you move text files from one system to another, use a utility like ftp
> that understands line ending conventions and does the appropriate
> translations. Don't expect Unix I/O libraries to understand Windows file
> conventions, or vice versa.


Sure. You got the right point. I am using a unix encoded file in
windows machine. Those came from http download, as a zipped folder and
thus doesn't handle the translation. So I need to handle them in
binary mode. Moreover the file format doesn't specify what is new-line
( I hope they should make it \n someday ).
So at present I need to handle both of them (ie from unix => windows &
unix, from windows => unix & windows ) . This at present I am doing
using binary mode, and discarding the \r if any. However I wonder ,
how to handle that if at all some mac file with \r comes!
Just want to know, as std library doesnt' handle this ,how it can be
done.

abir
> --
>
> -- Pete
> Roundhouse Consulting, Ltd. (www.versatilecoding.com)
> Author of "The Standard C++ Library Extensions: a Tutorial and
> Reference." (www.petebecker.com/tr1book)



 
Reply With Quote
 
 
 
 
Pete Becker
Guest
Posts: n/a
 
      02-20-2007
toton wrote:
>
> Sure. You got the right point. I am using a unix encoded file in
> windows machine. Those came from http download, as a zipped folder and
> thus doesn't handle the translation.


Most versions of unzip have a -a option that translates line terminators.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Opinions on complexity _Hobbes Java 21 12-17-2005 01:49 AM
How much complexity to put in POJOs? Scott Balmos Java 6 07-13-2005 04:14 AM
2.0 Controlling password complexity in Membership =?Utf-8?B?TW9yZ2FuIFJvZGVyaWNr?= ASP .Net 3 04-22-2005 12:23 AM
NYJavaSIG - EJB 3.0 and J2EE 5 - Bringing Simplicity to the Complexity Frank D. Greco Java 0 02-15-2005 05:31 AM
reducing complexity foldface@yahoo.co.uk ASP .Net 0 10-12-2004 01:05 PM



Advertisments