Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: bugs at iter file() ?

Reply
Thread Tools

Re: bugs at iter file() ?

 
 
Terry Reedy
Guest
Posts: n/a
 
      07-15-2004

"dw" <(E-Mail Removed)> wrote in message news:(E-Mail Removed)...
> Python 2.3.4, winxp:
>
> I have a large text file that unknowingly contains ascii
> character 1A, or chr(26). And doing this:
> for line in file(sys.argv[1]):
> print line
> would stop iteration at the specific line containing ascii
> char 1A, without raising exception or warning, although
> there were still remaining lines which has not been
> iterated.


To add to what Tim said: From the viewpoint of Windows in its default mode,
there are no remaining lines. ^Z is the end of file and anything after
that is accidental junk filling out the remainder of the disk block.

tjr



 
Reply With Quote
 
 
 
 
Michael Geary
Guest
Posts: n/a
 
      07-15-2004
Terry Reedy wrote:
> To add to what Tim said: From the viewpoint of Windows in
> its default mode, there are no remaining lines. ^Z is the end
> of file and anything after that is accidental junk filling out the
> remainder of the disk block.


Just to clarify one point... Windows itself does not have "text" or "binary"
files, and it does not treat ^Z in a file in any special way. There are no
special characters in files. A file is simply an array of arbitrary bytes
with an exact length.

For example, if you use Notepad to open a file with embedded ^Z characters,
you will see those characters in the text (typically as right arrows,
depending on the font). The file won't be truncated at the first ^Z.

It's the C runtime that makes the distinction between text and binary files
and treats ^Z specially. When you use fopen(filename,"rt") you get the
special behavior of translating CRLF pairs to LF characters and stopping at
the first ^Z.

Of course, from the point of view of a Python program, it hardly matters
whether it's Windows or the C runtime that is doing this. I just wanted to
clarify where this special behavior is taking place--it's nothing
fundamental to the operating system at all.

-Mike


 
Reply With Quote
 
 
 
 
Terry Reedy
Guest
Posts: n/a
 
      07-16-2004

"Michael Geary" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Terry Reedy wrote:
> > To add to what Tim said: From the viewpoint of Windows in
> > its default mode, there are no remaining lines. ^Z is the end
> > of file and anything after that is accidental junk filling out the
> > remainder of the disk block.

>
> Just to clarify one point... Windows itself does not have "text" or

"binary"
> files, and it does not treat ^Z in a file in any special way. There are

no
> special characters in files.


Sorry, but ^Z has meant end-of-file I presume from the first version of
DOS, which I suspect copied the usage from something previous. Example
(Microsoft Basic manual, 1989): "When input is redirected [from terminal to
a file], GW Basic continues to read from this source until a CTRL-Z is
detected." Perhaps the usage has dimmed in non-DOS-based Windows, so that
I should have said more carefully "from the viewpoint of DOS and perhaps
DOS-based Windows and partially in modern non-DOS-based Windows ...".
Still, in Windows XP, open a Command Prompt window and enter

disk:\path> copy con: temp
abd^Zdef

where ^Z is control-Z and you get a file with 3, not 7 characters.

The Windows version of the Python interactive interpreter exits on ^Z
because that is, or at least was, standard behavior for interactive non-gui
DOS/Windows programs

> For example, if you use Notepad to open a file with embedded ^Z

characters,
> you will see those characters in the text (typically as right arrows,
> depending on the font). The file won't be truncated at the first ^Z.


This surprises me a bit. Which version of Windows? Try type'ing the same
file ('type filename') in an XP Home command prompt. Even now, it should
be truncated (just tested this).

07/15/2004 11:07 PM 7 temb
....
C:\Documents and Settings\Terry>type temb
abc

I created temb as abc^Zdef with Python file.write (^Z=\032).

> It's the C runtime that makes the distinction between text and binary

files
> and treats ^Z specially.


The Microsoft Windows C runtime treats ^Z specially because that is, or at
least was, the OS convention. possibly since before there was a C compiler
for DOS.

Terry J. Reedy



 
Reply With Quote
 
Wolfgang Strobl
Guest
Posts: n/a
 
      08-07-2004
"Terry Reedy" <(E-Mail Removed)>:

>"Michael Geary" <(E-Mail Removed)> wrote in message
>news:(E-Mail Removed)...
>> Terry Reedy wrote:
>> > To add to what Tim said: From the viewpoint of Windows in
>> > its default mode, there are no remaining lines. ^Z is the end
>> > of file and anything after that is accidental junk filling out the
>> > remainder of the disk block.

>>
>> Just to clarify one point... Windows itself does not have "text" or "binary"
>> files, and it does not treat ^Z in a file in any special way. There are no
>> special characters in files.


>Sorry, but ^Z has meant end-of-file I presume from the first version of
>DOS, which I suspect copied the usage from something previous.


Sorry, but Michael got it right. Windows itself does not have 'text' or
'binary' files or open modes. Have a look at CreateFile in the Platform
SDK. You won't find anythink like _TEXT or _BINARY there.

^Z is a carryover from CP/M to DOS, which, like crlf<->lf translation,
got some support in various libraries, for obvious reasons. It's not
part of the Win32 API.


>Example
>(Microsoft Basic manual, 1989): "When input is redirected [from terminal to
>a file], GW Basic continues to read from this source until a CTRL-Z is
>detected."


So what? BASICA is an application, just like bash or sendmail.


>Perhaps the usage has dimmed in non-DOS-based Windows, so that
>I should have said more carefully "from the viewpoint of DOS and perhaps
>DOS-based Windows and partially in modern non-DOS-based Windows ...".
>Still, in Windows XP, open a Command Prompt window and enter
>
>disk:\path> copy con: temp
>abd^Zdef
>
>where ^Z is control-Z and you get a file with 3, not 7 characters.
>
>The Windows version of the Python interactive interpreter exits on ^Z
>because that is, or at least was, standard behavior for interactive non-gui
>DOS/Windows programs


You mean like terminating a program using a single dot on a line is, or
at least was, standard behaviour for interactive non-gui UNIX/Linux
applications?


--
Thank you for observing all safety precautions
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: In map iterator is there a difference between (*iter).secondand iter->second? Gennaro Prota C++ 1 08-21-2008 02:09 PM
vector.erase(iterator iter) will change "iter" or not? thomas C++ 23 02-26-2008 08:39 PM
Bugs and Bugs...get rid of them Jason Computer Security 1 01-31-2006 10:47 PM
Re[4]: bugs at iter file() ? dw Python 0 07-15-2004 04:19 AM
bugs at iter file() ? dw Python 0 07-15-2004 02:53 AM



Advertisments