Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename

Reply
Thread Tools

Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename

 
 
Dan Stromberg
Guest
Posts: n/a
 
      12-06-2010
Ultimately I switched to reading the filenames from file descriptor 0
using os.read(); this gave back bytes in 3.x, strings of single-byte
characters in 2.x - which are similar enough for my purposes, and
eliminated the filesystem encoding(s) question nicely.

I rewrote readline0
(http://stromberg.dnsalias.org/cgi-bi...runk/?root=svn)
for 2.x and 3.x to facilitate reading null-terminated strings from
stdin. It's in better shape now anyway - more OOP than functional,
and with a bunch of unit tests. The module now works on CPython 2.x,
CPython 3.x and PyPy 1.4 from the same code.

On Mon, Nov 29, 2010 at 9:26 PM, Dan Stromberg <(E-Mail Removed)> wrote:
> I've got a couple of programs that read filenames from stdin, and then
> open those files and do things with them. *These programs sort of do
> the *ix xargs thing, without requiring xargs.
>
> In Python 2, these work well. *Irrespective of how filenames are
> encoded, things are opened OK, because it's all just a stream of
> single byte characters.
>
> In Python 3, I'm finding that I have encoding issues with characters
> with their high bit set.* Things are fine with strictly ASCII
> filenames. *With high-bit-set characters, even if I change stdin's
> encoding with:
>
> * * * import io
> * * * STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')
>
> ...even with that, when I read a filename from stdin with a
> single-character Spanish n~, the program cannot open that filename
> because the n~ is apparently internally converted to two bytes, but
> remains one byte in the filesystem. *I decided to try ISO-8859-1 with
> Python 3, because I have a Java program that encountered a similar
> problem until I used en_US.ISO-8859-1 in an environment variable to
> set the JVM's encoding for stdin.
>
> Python 2 shows the n~ as 0xf1 in an os.listdir('.'). *Python 3 with an
> encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.
>
> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?
>
> TIA!
>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
Re: Is it possible to let a virtual file created by cStringIO havea filename so that functions can read it by its filename? Steven Howe Python 0 01-14-2011 10:32 PM
Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename Peter Otten Python 10 12-02-2010 11:12 PM
Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename Peter Otten Python 0 11-30-2010 10:52 AM
How to open file dialog in Ruby, and get open FileName? :-( iMelody Ooo Ruby 5 10-21-2010 04:02 PM



Advertisments