Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Py3.3 unicode literal and input()

Reply
Thread Tools

Py3.3 unicode literal and input()

 
 
jmfauth
Guest
Posts: n/a
 
      06-18-2012
We are turning in circles. You are somehow
legitimating the reintroduction of unicode
literals and I shew, not to say proofed, it may
be a source of problems.

Typical Python desease. Introduce a problem,
then discuss how to solve it, but surely and
definitivly do not remove that problem.

As far as I know, Python 3.2 is working very
well.

jmf

 
Reply With Quote
 
 
 
 
Andrew Berg
Guest
Posts: n/a
 
      06-18-2012
On 6/18/2012 11:32 AM, Jussi Piitulainen wrote:
> jmfauth writes:
>
>> Thinks are very clear to me. I wrote enough interactive
>> interpreters with all available toolkits for Windows

>
>>>> r = input()

> u'a
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> SyntaxError: u'a
>
> Er, no, not really
>

You're using 2.x; this thread concerns 3.3, which, as has been repeated
several times, does not evaluate strings passed via input() like 2.x.
That code does not raise a SyntaxError in 3.x.

--
CPython 3.3.0a4 | Windows NT 6.1.7601.17803
 
Reply With Quote
 
 
 
 
John Roth
Guest
Posts: n/a
 
      06-18-2012
On Monday, June 18, 2012 9:44:17 AM UTC-6, jmfauth wrote:
> Thinks are very clear to me. I wrote enough interactive
> interpreters with all available toolkits for Windows
> since I know Python (v. 1.5.6).
>
> I do not see why the semantic may vary differently
> in code source or in an interactive interpreter,
> esp. if Python allow it!
>
> If you have to know by advance what an end user
> is supposed to type and/or check it ('str' or unicode
> literal) in order to know if the answer has to be
> evaluated or not, then it is better to reintroduce
> input() and raw_input().
>


The change between Python 2.x and 3.x was made for security reasons. The developers felt, correctly in my opinion, that the simpler operation should not pose a security risk of a malicious user entering an expression that would corrupt the program.

In Python 3.x the equivalent of Python 2.x's input() function is eval(input()). It poses the same security risk: acting on unchecked user data.

John Roth


> jmf


 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      06-18-2012
On 06/18/2012 12:55 PM, Andrew Berg wrote:
> On 6/18/2012 11:32 AM, Jussi Piitulainen wrote:
>> jmfauth writes:
>>
>>> Thinks are very clear to me. I wrote enough interactive
>>> interpreters with all available toolkits for Windows
>>>>> r = input()

>> u'a
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> SyntaxError: u'a
>>
>> Er, no, not really
>>

> You're using 2.x; this thread concerns 3.3, which, as has been repeated
> several times, does not evaluate strings passed via input() like 2.x.
> That code does not raise a SyntaxError in 3.x.
>


And you're missing the context. jmfauth thinks we should re-introduce
the input/raw-input distinction so he could parse literal strings. So
Jussi demonstrated that the 2.x input did NOT satisfy fmfauth's dreams.



--

DaveA

 
Reply With Quote
 
Andrew Berg
Guest
Posts: n/a
 
      06-18-2012
On 6/18/2012 12:03 PM, Dave Angel wrote:
> And you're missing the context. jmfauth thinks we should re-introduce
> the input/raw-input distinction so he could parse literal strings. So
> Jussi demonstrated that the 2.x input did NOT satisfy fmfauth's dreams.


You're right. I missed that part of jmfauth's post.
--
CPython 3.3.0a4 | Windows NT 6.1.7601.17803
 
Reply With Quote
 
Jussi Piitulainen
Guest
Posts: n/a
 
      06-18-2012
Andrew Berg writes:
> On 6/18/2012 11:32 AM, Jussi Piitulainen wrote:
> > jmfauth writes:
> >
> >> Thinks are very clear to me. I wrote enough interactive
> >> interpreters with all available toolkits for Windows

> >
> >>>> r = input()

> > u'a
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > SyntaxError: u'a
> >
> > Er, no, not really
> >

> You're using 2.x; this thread concerns 3.3, which, as has been
> repeated several times, does not evaluate strings passed via input()
> like 2.x. That code does not raise a SyntaxError in 3.x.


I used 3.1.2, and I really meant the "not really". And the "". I
edited out the command that raised the exception.

This thread is weird. If I didn't know that things are very clear to
jmfauth, I would think that the behaviour of input() that I observe
has absolutely nothing to do with the u'' syntax in source code.
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      06-18-2012
On 6/18/2012 12:39 PM, jmfauth wrote:
> We are turning in circles.


You are, not we. Please stop.

> You are somehow legitimating the reintroduction of unicode
> literals


We are not 'reintroducing' unicode literals. In Python 3, string
literals *are* unicode literals.

Other developers reintroduced a now meaningless 'u' prefix for the
purpose of helping people write 2&3 code that runs on both Python 2 and
Python 3. Read about it here http://python.org/dev/peps/pep-0414/

In Python 3.3, 'u' should *only* be used for that purpose and should be
ignored by anyone not writing or editing 2&3 code. If you are not
writing such code, ignore it.

> and I shew, not to say proofed, it may
> be a source of problems.


You are the one making it be a problem.

> Typical Python desease. Introduce a problem,
> then discuss how to solve it, but surely and
> definitivly do not remove that problem.


The simultaneous reintroduction of 'ur', but with a different meaning
than in 2.7, *was* a problem and it should be removed in the next release.

> As far as I know, Python 3.2 is working very
> well.


Except that many public libraries that we would like to see ported to
Python 3 have not been. The purpose of reintroducing 'u' is to encourage
more porting of Python 2 code. Period.

--
Terry Jan Reedy



 
Reply With Quote
 
jmfauth
Guest
Posts: n/a
 
      06-18-2012
On Jun 18, 8:45*pm, Terry Reedy <(E-Mail Removed)> wrote:
> On 6/18/2012 12:39 PM, jmfauth wrote:
>
> > We are turning in circles.

>
> You are, not we. Please stop.
>
> > You are somehow legitimating the reintroduction of unicode
> > literals

>
> We are not 'reintroducing' unicode literals. In Python 3, string
> literals *are* unicode literals.
>
> Other developers reintroduced a now meaningless 'u' prefix for the
> purpose of helping people write 2&3 code that runs on both Python 2 and
> Python 3. Read about it herehttp://python.org/dev/peps/pep-0414/
>
> In Python 3.3, 'u' should *only* be used for that purpose and should be
> ignored by anyone not writing or editing 2&3 code. If you are not
> writing such code, ignore it.
>
> *> and I shew, not to say proofed, it may
>
> > be a source of problems.

>
> You are the one making it be a problem.
>
> > Typical Python desease. Introduce a problem,
> > then discuss how to solve it, but surely and
> > definitivly do not remove that problem.

>
> The simultaneous reintroduction of 'ur', but with a different meaning
> than in 2.7, *was* a problem and it should be removed in the next release..
>
> > As far as I know, Python 3.2 is working very
> > well.

>
> Except that many public libraries that we would like to see ported to
> Python 3 have not been. The purpose of reintroducing 'u' is to encourage
> more porting of Python 2 code. Period.
>
> --
> Terry Jan Reedy


It's a matter of perspective. I expected to have
finally a clean Python, the goal is missed.

I have nothing to object. It is "your" (core devs)
project, not mine. At least, you understood my point
of view.

I'm a more than two decades TeX user. At the release
of XeTeX (a pure unicode TeX-engine), the devs had,
like Python2/3, to make anything incompatible. A success.
It did not happen a week without seeing a updated
package or a refreshed documentation.

Luckily for me, Xe(La)TeX is more important than
Python.

As a scientist, Python is perfect.
From an educational point of view, I'm becoming
more and more skeptical about this language, a
moving target.

Note that I'm not complaining, only "desappointed".

jmf

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      06-19-2012
On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote:

> On 18 juin, 12:11, Steven D'Aprano <steve
> (E-Mail Removed)> wrote:
>> On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
>> > On 18 juin, 10:28, Benjamin Kaplan <(E-Mail Removed)> wrote:
>> >> The u prefix is only there to
>> >> make it easier to port a codebase from Python 2 to Python 3. It
>> >> doesn't actually do anything.

>>
>> > It does. I shew it!

>>
>> Incorrect. You are assuming that Python 3 input eval's the input like
>> Python 2 does. That is wrong. All you show is that the one-character
>> string "a" is not equal to the four-character string "u'a'", which is
>> hardly a surprise. You wouldn't expect the string "3" to equal the
>> string "int('3')" would you?
>>
>> --
>> Steven

>
>
> A string is a string, a "piece of text", period.
>
> I do not see why a unicode literal and an (well, I do not know how the
> call it) a "normal class <str>" should behave differently in code source
> or as an answer to an input().


They do not. As you showed earlier, in Python 3.3 the literal strings
u'a' and 'a' have the same meaning: both create a one-character string
containing the Unicode letter LOWERCASE-A.

Note carefully that the quotation marks are not part of the string. They
are delimiters. Python 3.3 allows you to create a string by using
delimiters:

' '
" "
u' '
u" "

plus triple-quoted versions of the same. The delimiter is not part of the
string. They are only there to mark the start and end of the string in
source code so that Python can tell the difference between the string "a"
and the variable named "a".

Note carefully that quotation marks can exist inside strings:

my_string = "This string has 'quotation marks'."

The " at the start and end of the string literal are delimiters, not part
of the string, but the internal ' characters *are* part of the string.

When you read data from a file, or from the keyboard using input(),
Python takes the data and returns a string. You don't need to enter
delimiters, because there is no confusion between a string (all data you
read) and other programming tokens.

For example:

py> s = input("Enter a string: ")
Enter a string: 42
py> print(s, type(s))
42 <class 'str'>

Because what I type is automatically a string, I don't need to enclose it
in quotation marks to distinguish it from the integer 42.

py> s = input("Enter a string: ")
Enter a string: This string has 'quotation marks'.
py> print(s, type(s))
This string has 'quotation marks'. <class 'str'>


What you type is exactly what you get, no more, no less.

If you type 42, you get the two character string "42" and not the int 42.

If you type [1, 2, 3], then you get the nine character string "[1, 2, 3]"
and not a list containing integers 1, 2 and 3.

If you type 3**0.5 then you get the six character string "3**0.5" and not
the float 1.7320508075688772.

If you type u'a' then you get the four character string "u'a'" and not
the single character 'a'.

There is nothing new going on here. The behaviour of input() in Python 3,
and raw_input() in Python 2, has not changed.


> Should a user write two derived functions?
>
> input_for_entering_text()
> and
> input_if_you_are_entering_a_text_as_litteral()


If you, the programmer, want to force the user to write input in Python
syntax, then yes, you have to write a function to do so. input() is very
simple: it just reads strings exactly as typed. It is up to you to
process those strings however you wish.



--
Steven
 
Reply With Quote
 
jmfauth
Guest
Posts: n/a
 
      06-20-2012
On Jun 20, 1:21*am, Steven D'Aprano <steve
(E-Mail Removed)> wrote:
> On Mon, 18 Jun 2012 07:00:01 -0700, jmfauth wrote:
> > On 18 juin, 12:11, Steven D'Aprano <steve
> > (E-Mail Removed)> wrote:
> >> On Mon, 18 Jun 2012 02:30:50 -0700, jmfauth wrote:
> >> > On 18 juin, 10:28, Benjamin Kaplan <(E-Mail Removed)> wrote:
> >> >> The u prefix is only there to
> >> >> make it easier to port a codebase from Python 2 to Python 3. It
> >> >> doesn't actually do anything.

>
> >> > It does. I shew it!

>
> >> Incorrect. You are assuming that Python 3 input eval's the input like
> >> Python 2 does. That is wrong. All you show is that the one-character
> >> string "a" is not equal to the four-character string "u'a'", which is
> >> hardly a surprise. You wouldn't expect the string "3" to equal the
> >> string "int('3')" would you?

>
> >> --
> >> Steven

>
> > A string is a string, a "piece of text", period.

>
> > I do not see why a unicode literal and an (well, I do not know how the
> > call it) a "normal class <str>" should behave differently in code source
> > or as an answer to an input().

>
> They do not. As you showed earlier, in Python 3.3 the literal strings
> u'a' and 'a' have the same meaning: both create a one-character string
> containing the Unicode letter LOWERCASE-A.
>
> Note carefully that the quotation marks are not part of the string. They
> are delimiters. Python 3.3 allows you to create a string by using
> delimiters:
>
> ' '
> " "
> u' '
> u" "
>
> plus triple-quoted versions of the same. The delimiter is not part of the
> string. They are only there to mark the start and end of the string in
> source code so that Python can tell the difference between the string "a"
> and the variable named "a".
>
> Note carefully that quotation marks can exist inside strings:
>
> my_string = "This string has 'quotation marks'."
>
> The " at the start and end of the string literal are delimiters, not part
> of the string, but the internal ' characters *are* part of the string.
>
> When you read data from a file, or from the keyboard using input(),
> Python takes the data and returns a string. You don't need to enter
> delimiters, because there is no confusion between a string (all data you
> read) and other programming tokens.
>
> For example:
>
> py> s = input("Enter a string: ")
> Enter a string: 42
> py> print(s, type(s))
> 42 <class 'str'>
>
> Because what I type is automatically a string, I don't need to enclose it
> in quotation marks to distinguish it from the integer 42.
>
> py> s = input("Enter a string: ")
> Enter a string: This string has 'quotation marks'.
> py> print(s, type(s))
> This string has 'quotation marks'. <class 'str'>
>
> What you type is exactly what you get, no more, no less.
>
> If you type 42, you get the two character string "42" and not the int 42.
>
> If you type [1, 2, 3], then you get the nine character string "[1, 2, 3]"
> and not a list containing integers 1, 2 and 3.
>
> If you type 3**0.5 then you get the six character string "3**0.5" and not
> the float 1.7320508075688772.
>
> If you type u'a' then you get the four character string "u'a'" and not
> the single character 'a'.
>
> There is nothing new going on here. The behaviour of input() in Python 3,
> and raw_input() in Python 2, has not changed.
>
> > Should a user write two derived functions?

>
> > input_for_entering_text()
> > and
> > input_if_you_are_entering_a_text_as_litteral()

>
> If you, the programmer, want to force the user to write input in Python
> syntax, then yes, you have to write a function to do so. input() is very
> simple: it just reads strings exactly as typed. It is up to you to
> process those strings however you wish.
>
> --
> Steven



Python 3.3.0a4 (v3.3.0a4:7c51388a3aa7+, May 31 2012, 20:15:21) [MSC v.
1600
32 bit (Intel)] on win32
>>> ---

running smidzero.py...
....smidzero has been executed
>>> ---

input(':')
:éléphant
'éléphant'
>>> ---

input(':')
:u'éléphant'
'éléphant'
>>> ---

input(':')
:u'\u00e9l\xe9phant'
'éléphant'
>>> ---

input(':')
:u'\U000000e9léphant'
'éléphant'
>>> ---

input(':')
:\U000000e9léphant
'éléphant'
>>> ---
>>> ---

# this is expected
>>> ---

input(':')
:b'éléphant'
"b'éléphant'"
>>> ---

len(input(':'))
:b'éléphant'
11

---

Good news on the ru''/ur'' front:
http://bugs.python.org/issue15096

---

Finally I'm just wondering if this unicode_literal
reintroduction is not a bad idea.

b'these_are_bytes'
u'this_is_a_unicode_string'

I wrote all my Py2 code in a "unicode mode" since ... Py2.3 (?).

jmf
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: illegal line end in character literal in escaped unicode CARRIAGERETURN / NEW LINE Mark Space Java 0 05-15-2009 02:35 PM
Re: illegal line end in character literal in escaped unicode CARRIAGE RETURN / NEW LINE Andreas Leitgeb Java 0 05-15-2009 02:02 PM
Embedding a literal "\u" in a unicode raw string. Romano Giannetti Python 7 03-07-2008 03:18 PM
Re: Embedding a literal "\u" in a unicode raw string. Romano Giannetti Python 1 02-25-2008 04:21 PM
What's wrong with rpc-literal? Why use doc-literal? Anonieko Ramos ASP .Net Web Services 0 09-27-2004 09:06 AM



Advertisments