Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename

Reply
Thread Tools

Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename

 
 
Peter Otten
Guest
Posts: n/a
 
      11-30-2010
Dan Stromberg wrote:

> I've got a couple of programs that read filenames from stdin, and then
> open those files and do things with them. These programs sort of do
> the *ix xargs thing, without requiring xargs.
>
> In Python 2, these work well. Irrespective of how filenames are
> encoded, things are opened OK, because it's all just a stream of
> single byte characters.


I think you're wrong. The filenames' encoding as they are read from stdin
must be the same as the encoding used by the file system. If the file system
expects UTF-8 and you feed it ISO-8859-1 you'll run into errors.

You always have to know either

(a) both the file system's and stdin's actual encoding, or
(b) that both encodings are the same.

If byte strings work you are in situation (b) or just lucky. I'd guess the
latter

> In Python 3, I'm finding that I have encoding issues with characters
> with their high bit set. Things are fine with strictly ASCII
> filenames. With high-bit-set characters, even if I change stdin's
> encoding with:
>
> import io
> STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')


I suppose you can handle (b) with

STDIN = sys.stdin.buffer

or

STDIN = io.TextIOWrapper(sys.stdin.buffer,
encoding=sys.getfilesystemencoding())

in Python 3. I'd prefer the latter because it makes your assumptions
explicit. (Disclaimer: I'm not sure whether I'm using the io API as Guido
intended it)

> ...even with that, when I read a filename from stdin with a
> single-character Spanish n~, the program cannot open that filename
> because the n~ is apparently internally converted to two bytes, but
> remains one byte in the filesystem. I decided to try ISO-8859-1 with
> Python 3, because I have a Java program that encountered a similar
> problem until I used en_US.ISO-8859-1 in an environment variable to
> set the JVM's encoding for stdin.
>
> Python 2 shows the n~ as 0xf1 in an os.listdir('.'). Python 3 with an
> encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.
>
> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?
>
> TIA!



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
Re: Is it possible to let a virtual file created by cStringIO havea filename so that functions can read it by its filename? Steven Howe Python 0 01-14-2011 10:32 PM
Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename Dan Stromberg Python 0 12-06-2010 05:01 AM
Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename Peter Otten Python 10 12-02-2010 11:12 PM
How to open file dialog in Ruby, and get open FileName? :-( iMelody Ooo Ruby 5 10-21-2010 04:02 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57