Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Pickled text file causing ValueError (dos/unix issue) (http://www.velocityreviews.com/forums/t340312-pickled-text-file-causing-valueerror-dos-unix-issue.html)

Aki Niimura 01-14-2005 06:00 AM

Pickled text file causing ValueError (dos/unix issue)
 
Hello everyone,

I started to use pickle to store the latest user settings for the tool
I wrote. It writes out a pickled text file when it terminates and it
restores the settings when it starts.

It worked very nicely.

However, I got a ValueError when I started the tool from Unix when I
previously used the tool from Windows.

File "/usr/local/lib/python2.3/pickle.py", line 980, in load_string
raise ValueError, "insecure string pickle"
ValueError: insecure string pickle

If I do 'dos2unix <my.cfg> <my.cfg>' to convert the file, then
everything
becomes fine.

I found in the Python release note saying ...
"pickle: Now raises ValueError when an invalid pickle that contains a
non-string repr where a string repr was expected. This behavior matches
cPickle."

I guess DOS text format is creating this problem.
My question is "Is there any elegant way to deal with this?".

I certainly can catch ValueError and run 'dos2unix' explicitly.
But I don't like such crude solution.
Any suggestions would be highly appreciated.

Best regards,
Aki Niimura


Paul Rubin 01-14-2005 06:19 AM

Re: Pickled text file causing ValueError (dos/unix issue)
 
Open the file on windows for writing with "wb" mode, the b is for binary.

Tim Peters 01-14-2005 02:12 PM

Re: Pickled text file causing ValueError (dos/unix issue)
 
[Aki Niimura]
> I started to use pickle to store the latest user settings for the tool
> I wrote. It writes out a pickled text file when it terminates and it
> restores the settings when it starts.

....
> I guess DOS text format is creating this problem.


Yes.

> My question is "Is there any elegant way to deal with this?".


Yes: regardless of platform, always open files used for pickles in
binary mode. That is, pass "rb" to open() when reading a pickle file,
and "wb" to open() when writing a pickle file. Then your pickle files
will work unchanged on all platforms. The same is true of files
containing binary data of any kind (and despite that pickle protocol 0
was called "text mode" for years, it's still binary data).

Irmen de Jong 01-14-2005 05:55 PM

Why 'r' mode anyway? (was: Re: Pickled text file causing ValueError(dos/unix issue))
 
Tim Peters wrote:

> Yes: regardless of platform, always open files used for pickles in
> binary mode. That is, pass "rb" to open() when reading a pickle file,
> and "wb" to open() when writing a pickle file. Then your pickle files
> will work unchanged on all platforms. The same is true of files
> containing binary data of any kind (and despite that pickle protocol 0
> was called "text mode" for years, it's still binary data).


I've been wondering why there even is the choice between binary mode
and text mode. Why can't we just do away with the 'text mode' ?
What does it do, anyways? At least, if it does something, I'm sure
that it isn't something that can be done in Python itself if
really required to do so...

--Irmen


Tim Peters 01-14-2005 06:16 PM

Re: Why 'r' mode anyway? (was: Re: Pickled text file causingValueError (dos/unix issue))
 
[Irmen de Jong]
> I've been wondering why there even is the choice between binary mode
> and text mode. Why can't we just do away with the 'text mode' ?
> What does it do, anyways? At least, if it does something, I'm sure
> that it isn't something that can be done in Python itself if
> really required to do so...


It's not Python's decision, it's the operating system's. Whether
there's an actual difference between text mode and binary mode is up
to the operating system, and, if there is an actual difference, every
detail about what the difference(s) consists of is also up to the
operating system. That differences may exist is reflected in the C
standard, and the rules for text-mode files are more restrictive than
most people would believe.

On Unixish systems, there's no difference. On Windows boxes, there
are conceptually small differences with huge consequences, and the
distinction appears to be kept just for backward-compatibility
reasons. On some other systems, text and binary files are entirely
different kinds of beasts.

If Python didn't offer text mode then it would be clumsy at best to
use Python to write ordinary human-readable text files in the format
that native software on Windows, and Mac Classic, and VAX (and ...)
expects (and the native format for text mode differs across all of
them). If Python didn't offer binary mode then it wouldn't be
possible to use Python to process data in binary files on Windows and
Mac Classic and VAX (and ...). If Python used its own
platform-independent file format, then it would end up creating files
that other programs wouldn't be able to deal with.

Live with it <wink>.

Serge Orlov 01-14-2005 07:46 PM

Re: Why 'r' mode anyway? (was: Re: Pickled text file causing ValueError (dos/unix issue))
 
Irmen de Jong wrote:
> Tim Peters wrote:
>
> > Yes: regardless of platform, always open files used for pickles in
> > binary mode. That is, pass "rb" to open() when reading a pickle

file,
> > and "wb" to open() when writing a pickle file. Then your pickle

files
> > will work unchanged on all platforms. The same is true of files
> > containing binary data of any kind (and despite that pickle

protocol 0
> > was called "text mode" for years, it's still binary data).

>
> I've been wondering why there even is the choice between binary mode
> and text mode. Why can't we just do away with the 'text mode' ?


We can't because characters and bytes are not the same things. But I
believe what you're really complaining about is that "t" mode sometimes
mysteriously corrupts data if processed by the code that expects binary
files. In Python 3.0 it will be fixed because file.read will have to
return different objects: bytes for "b" mode, str for "t" mode. It
would be great if file type was split into binfile and textfile,
removing need for cryptic "b" and "t" modes but I'm afraid that's too
much of a change even for Python 3.0

Serge.


Irmen de Jong 01-14-2005 08:13 PM

Re: Why 'r' mode anyway?
 
Tim Peters wrote:
> That differences may exist is reflected in the C
> standard, and the rules for text-mode files are more restrictive than
> most people would believe.


Apparently. Because I know only about the Unix <-> Windows difference
(windows converts \r\n <--> \n when using 'r' mode, right).
So it's in the line endings.

Is there more obscure stuff going on on the other systems you
mentioned (Mac OS, VAX) ?

(That means that the bug in Simplehttpserver that my patch
839496 addressed, also occured on those systems? Or that
the patch may be incorrect after all??)

While your argument about why Python doesn't use its own platform-
independent file format is sound ofcourse, I find it often a nuisance
that platform specific things tricle trough into Python itself and
ultimately in the programs you write. I sometimes feel that some
parts of Python expose the underlying C/os implementation
a bit too much. Python never claimed write once run anywhere (as
that other language does) but it would have been nice nevertheless ;-)
In practice it's just not possible I guess.

Thanks,
--Irmen

John Machin 01-14-2005 08:32 PM

Re: Pickled text file causing ValueError (dos/unix issue)
 
On Fri, 14 Jan 2005 09:12:49 -0500, Tim Peters <tim.peters@gmail.com>
wrote:

>[Aki Niimura]
>> I started to use pickle to store the latest user settings for the tool
>> I wrote. It writes out a pickled text file when it terminates and it
>> restores the settings when it starts.

>...
>> I guess DOS text format is creating this problem.

>
>Yes.
>
>> My question is "Is there any elegant way to deal with this?".

>
>Yes: regardless of platform, always open files used for pickles in
>binary mode. That is, pass "rb" to open() when reading a pickle file,
>and "wb" to open() when writing a pickle file. Then your pickle files
>will work unchanged on all platforms. The same is true of files
>containing binary data of any kind (and despite that pickle protocol 0
>was called "text mode" for years, it's still binary data).


Tim, the manual as of version 2.4 does _not_ mention the need to use
'b' on OSes where it makes a difference, not even in the examples at
the end of the chapter. Further, it still refers to protocol 0 as
'text' in several places. There is also a reference to protocol 0
files being viewable in a text editor.

In other words, enough to lead even the most careful Reader of TFM up
the garden path :-)

Cheers,
John

Tim Peters 01-14-2005 08:56 PM

Re: Pickled text file causing ValueError (dos/unix issue)
 
[Tim Peters]
>>Yes: regardless of platform, always open files used for pickles
>> in binary mode. ...


[John Machin]
> Tim, the manual as of version 2.4 does _not_ mention the need
> to use 'b' on OSes where it makes a difference, not even in the
> examples at the end of the chapter. Further, it still refers to
> protocol 0 as 'text' in several places. There is also a reference to
> protocol 0 files being viewable in a text editor.
>
> In other words, enough to lead even the most careful Reader of
> TFM up the garden path :-)


Take the next step: submit a patch with corrected text. I'm not paid
to work on the Python docs either <0.5 wink>. (BTW, protocol 0 files
are viewable in a text editor regardless, although the line ends may
"look funny")

Tim Peters 01-15-2005 12:13 AM

Re: Why 'r' mode anyway?
 
[Tim Peters]
>> That differences may exist is reflected in the C
>> standard, and the rules for text-mode files are more restrictive
>> than most people would believe.


[Irmen de Jong]
> Apparently. Because I know only about the Unix <-> Windows
> difference (windows converts \r\n <--> \n when using 'r' mode,
> right). So it's in the line endings.


That's one difference. The worse difference is that, in text mode on
Windows, the first instance of chr(26) in a file is taken as meaning
"that's the end of the file", no matter how many bytes may follow it.
That's fine by the C standard, because everything about a text-mode
file containing a chr(26) character is undefined.

> Is there more obscure stuff going on on the other systems you
> mentioned (Mac OS, VAX) ?


I think on Mac Classic it was *just* line end differences. Native VAX
has many file formats. "Record-based" file formats used to be very
popular. There the OS saves meta-information in the file, such as
each record contains an offset to the start of the next record, and
may even contain an index structure to support random access to
records quickly (for example, "a line" may be a record, and "read the
last line" may go quickly). Read that in binary mode, and you'll be
reading up the bits in the index and offsets too, etc. IIRC, Unix was
actually quite novel at the time in insisting that all files were just
raw byte streams to the OS.

> (That means that the bug in Simplehttpserver that my patch
> 839496 addressed, also occured on those systems? Or that
> the patch may be incorrect after all??)


Don't know, and (sorry) no time to dig.

> While your argument about why Python doesn't use its own
> platform- independent file format is sound of course, I find it often
> a nuisance that platform specific things tricle trough into Python
> itself and ultimately in the programs you write. I sometimes feel
> that some parts of Python expose the underlying C/os
> implementation a bit too much. Python never claimed write once
> run anywhere (as that other language does) but it would have
> been nice nevertheless ;-)
> In practice it's just not possible I guess.


It would be difficult at best. Python hides a lot of platform crap,
but generally where it's reasonably easy to hide. It's not easy to
hide native file conventions, partly because Python wouldn't play well
with *other* platform software if it did.

Remember that Guido worked on ABC before Python, and Python is in
(small) part a reaction against the extremes of ABC. ABC was 100%
platform-independent. You could read and write files from ABC.
However, the only files you could read from ABC were files that were
written by ABC -- and files written by ABC were essentially unusable
by other software. Socket semantics were also 100% portable in ABC:
it didn't have sockets, nor any way to extend the language to add
them. Etc -- ABC was a self-contained universe. "Plays well with
others" was a strong motivator for Python's design, and that often
means playing by others' rules.


All times are GMT. The time now is 03:56 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.