Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   question on java lang spec chapter 3.3 (unicode char lexing) (http://www.velocityreviews.com/forums/t956054-question-on-java-lang-spec-chapter-3-3-unicode-char-lexing.html)

Aryeh M. Friedman 01-02-2013 08:20 AM

question on java lang spec chapter 3.3 (unicode char lexing)
 
If I am lexer for Java in a 100% unicode environment (it already uses unicode for all internal representation of text) and 100% of the code that I will be lexing is from that environment do I need still deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume thatno code will be imported from non-unicode environments

Aryeh M. Friedman 01-02-2013 08:24 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On Wednesday, January 2, 2013 3:20:12 AM UTC-5, Aryeh M. Friedman wrote:
> If I am lexer for Java in a 100% unicode environment (it already uses unicode for all internal representation of text) and 100% of the code that I will be lexing is from that environment do I need still deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code will be imported from non-unicode environments


Just a follow up this is for a Java to native (x86) compiler written in Java I am doing for fun (no practical purpose except for practice in compiler writing [not for school or work])

Lew 01-02-2013 07:16 PM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On Wednesday, January 2, 2013 12:20:12 AM UTC-8, Aryeh M. Friedman wrote:
> If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal
> representation of text) and 100% of the code that I will be lexing is from that environment do I need still
> deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code
> will be imported from non-unicode environments


What do you mean "have to deal with"?

If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final
authority on what that constitutes.

Being "in a 100% unicode [sic] environment" (whatever that's supposed to mean) does not excuse
any responsibilities.

Nor does it obviate the need for the occasional "\uXXXX" in source.

However, I don't think the lexer deals with that. Unicode escape sequences are a precompile phenomenon. Everything is substituted before parsing starts.

--
Lew

Roedy Green 01-02-2013 07:17 PM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
<Aryeh.Friedman@gmail.com> wrote, quoted or indirectly quoted someone
who said :

> (\uXXXX)


The only places you encounter such escapes are in Java source and
possibly resource bundles.

Other types of escape you run into are like &eacute;, �x0123;
or {
--
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish
as couch potatoes who hire others to go to the gym for them.

Arne Vajh°j 01-03-2013 12:54 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On 1/2/2013 3:20 AM, Aryeh M. Friedman wrote:
> If I am lexer for Java in a 100% unicode environment (it already uses
> unicode for all internal representation of text) and 100% of the code
> that I will be lexing is from that environment do I need still deal
> with unicode escapes (\uXXXX) in real life [vs. theortically complete
> lexing]... assume that no code will be imported from non-unicode
> environments


It will not be a Java lexer if it does not understand
that.

And is it that much effort to implement that you would
rather create a AMF lexer instead?

I suspect that it is easy to implement.

Arne


Arne Vajh°j 01-03-2013 12:55 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On 1/2/2013 2:16 PM, Lew wrote:
> On Wednesday, January 2, 2013 12:20:12 AM UTC-8, Aryeh M. Friedman wrote:
>> If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal
>> representation of text) and 100% of the code that I will be lexing is from that environment do I need still
>> deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code
>> will be imported from non-unicode environments

>
> What do you mean "have to deal with"?
>
> If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final
> authority on what that constitutes.
>
> Being "in a 100% unicode [sic] environment" (whatever that's supposed to mean) does not excuse
> any responsibilities.
>
> Nor does it obviate the need for the occasional "\uXXXX" in source.
>
> However, I don't think the lexer deals with that. Unicode escape sequences are a precompile phenomenon. Everything is substituted before parsing starts.


Well - lexing happens before parsing so ...

Arne



Arne Vajh°j 01-03-2013 12:56 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On 1/2/2013 2:17 PM, Roedy Green wrote:
> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
> <Aryeh.Friedman@gmail.com> wrote, quoted or indirectly quoted someone
> who said :
>
>> (\uXXXX)

>
> The only places you encounter such escapes are in Java source and
> possibly resource bundles.


Well - since he is writing a lexer for Java then ...

Arne


Lew 01-03-2013 01:21 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
Arne Vajh°j wrote:
> Lew wrote:
>>Aryeh M. Friedman wrote:
>>> If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal
>>> representation of text) and 100% of the code that I will be lexing is from that environment do I need still
>>> deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code
>>> will be imported from non-unicode environments

>
>> What do you mean "have to deal with"?
>>
>> If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final
>> authority on what that constitutes.

>
>> Being "in a 100% unicode [sic] environment" (whatever that's supposed tomean) does not excuse
> > any responsibilities.

>
>> Nor does it obviate the need for the occasional "\uXXXX" in source.

>
>> However, I don't think the lexer deals with that. Unicode escape sequences are a precompile
>> phenomenon. Everything is substituted before parsing starts.

>
> Well - lexing happens before parsing so ...


So does writing source code. What's your point?

My point is that the lexer picks up after the substitution of Unicode sequences.
However, my point is wrong, and yours is right.

http://www.docjar.com/html/api/com/s...exer.java.html

--
Lew


Aryeh M. Friedman 01-03-2013 01:27 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 

>
> Well - since he is writing a lexer for Java then ...


A little more on the project... while the over all project *IS* for fun a few components may find there way into more serious work related projects but only to be used on code written by me or others on my team... specifically we may use the lexing/parsing component to make the following tools (the actual code generation/etc. of the compilation is currently purely fun [seenote]):

1. Scan for a complete list of classes referenced by a given class (our build system sometimes hiccups on not realizing that when class X calls an instance of class Y and Y has been modified it needs to recompile X {if, and only if, the signature(s) have changed})

2. Do some minor style enforcement like warning (have not decided if it should reject or just warn) if a class/method does not have something that at least looks like a javadoc header comment (/** ... */ is sufficient for this purpose)

Note:

A long term personal project of mine is to write a OS completely from the ground up in a super set of Java (the only addition I see that is needed is some type of "safe" pointer type)... in this case safe being defined as youcan assign a literal address to it but your not allowed to do ptr math on it

Aryeh M. Friedman 01-03-2013 01:32 AM

Re: question on java lang spec chapter 3.3 (unicode char lexing)
 
On Wednesday, January 2, 2013 8:27:21 PM UTC-5, Aryeh M. Friedman wrote:
> >

>
> > Well - since he is writing a lexer for Java then ...

>
>
>
> A little more on the project... while the over all project *IS* for fun afew components may find there way into more serious work related projects but only to be used on code written by me or others on my team... specifically we may use the lexing/parsing component to make the following tools (the actual code generation/etc. of the compilation is currently purely fun [see note]):
>
>
>
> 1. Scan for a complete list of classes referenced by a given class (our build system sometimes hiccups on not realizing that when class X calls an instance of class Y and Y has been modified it needs to recompile X {if, andonly if, the signature(s) have changed})
>
>
>
> 2. Do some minor style enforcement like warning (have not decided if it should reject or just warn) if a class/method does not have something that at least looks like a javadoc header comment (/** ... */ is sufficient for this purpose)
>
>
>
> Note:
>
>
>
> A long term personal project of mine is to write a OS completely from theground up in a super set of Java (the only addition I see that is needed is some type of "safe" pointer type)... in this case safe being defined as you can assign a literal address to it but your not allowed to do ptr math on it


In case anyone is interested I have some personal notes on the project at http://dt.fnwe.net/a-javacNative/


All times are GMT. The time now is 03:56 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.