Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > question on java lang spec chapter 3.3 (unicode char lexing)

Reply
Thread Tools

question on java lang spec chapter 3.3 (unicode char lexing)

 
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-02-2013
If I am lexer for Java in a 100% unicode environment (it already uses unicode for all internal representation of text) and 100% of the code that I will be lexing is from that environment do I need still deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume thatno code will be imported from non-unicode environments
 
Reply With Quote
 
 
 
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-02-2013
On Wednesday, January 2, 2013 3:20:12 AM UTC-5, Aryeh M. Friedman wrote:
> If I am lexer for Java in a 100% unicode environment (it already uses unicode for all internal representation of text) and 100% of the code that I will be lexing is from that environment do I need still deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code will be imported from non-unicode environments


Just a follow up this is for a Java to native (x86) compiler written in Java I am doing for fun (no practical purpose except for practice in compiler writing [not for school or work])
 
Reply With Quote
 
 
 
 
Lew
Guest
Posts: n/a
 
      01-02-2013
On Wednesday, January 2, 2013 12:20:12 AM UTC-8, Aryeh M. Friedman wrote:
> If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal
> representation of text) and 100% of the code that I will be lexing is from that environment do I need still
> deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code
> will be imported from non-unicode environments


What do you mean "have to deal with"?

If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final
authority on what that constitutes.

Being "in a 100% unicode [sic] environment" (whatever that's supposed to mean) does not excuse
any responsibilities.

Nor does it obviate the need for the occasional "\uXXXX" in source.

However, I don't think the lexer deals with that. Unicode escape sequences are a precompile phenomenon. Everything is substituted before parsing starts.

--
Lew
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      01-02-2013
On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
<> wrote, quoted or indirectly quoted someone
who said :

> (\uXXXX)


The only places you encounter such escapes are in Java source and
possibly resource bundles.

Other types of escape you run into are like &eacute;, �x0123;
or {
--
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish
as couch potatoes who hire others to go to the gym for them.
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      01-03-2013
On 1/2/2013 3:20 AM, Aryeh M. Friedman wrote:
> If I am lexer for Java in a 100% unicode environment (it already uses
> unicode for all internal representation of text) and 100% of the code
> that I will be lexing is from that environment do I need still deal
> with unicode escapes (\uXXXX) in real life [vs. theortically complete
> lexing]... assume that no code will be imported from non-unicode
> environments


It will not be a Java lexer if it does not understand
that.

And is it that much effort to implement that you would
rather create a AMF lexer instead?

I suspect that it is easy to implement.

Arne

 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      01-03-2013
On 1/2/2013 2:16 PM, Lew wrote:
> On Wednesday, January 2, 2013 12:20:12 AM UTC-8, Aryeh M. Friedman wrote:
>> If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal
>> representation of text) and 100% of the code that I will be lexing is from that environment do I need still
>> deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code
>> will be imported from non-unicode environments

>
> What do you mean "have to deal with"?
>
> If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final
> authority on what that constitutes.
>
> Being "in a 100% unicode [sic] environment" (whatever that's supposed to mean) does not excuse
> any responsibilities.
>
> Nor does it obviate the need for the occasional "\uXXXX" in source.
>
> However, I don't think the lexer deals with that. Unicode escape sequences are a precompile phenomenon. Everything is substituted before parsing starts.


Well - lexing happens before parsing so ...

Arne


 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      01-03-2013
On 1/2/2013 2:17 PM, Roedy Green wrote:
> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
> <> wrote, quoted or indirectly quoted someone
> who said :
>
>> (\uXXXX)

>
> The only places you encounter such escapes are in Java source and
> possibly resource bundles.


Well - since he is writing a lexer for Java then ...

Arne

 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      01-03-2013
Arne Vajhøj wrote:
> Lew wrote:
>>Aryeh M. Friedman wrote:
>>> If I am lexer for Java in a 100% unicode [sic] environment (it already uses unicode for all internal
>>> representation of text) and 100% of the code that I will be lexing is from that environment do I need still
>>> deal with unicode escapes (\uXXXX) in real life [vs. theortically complete lexing]... assume that no code
>>> will be imported from non-unicode environments

>
>> What do you mean "have to deal with"?
>>
>> If you mean to parse Java source, you have to be able to parse Java source. The JLS is the final
>> authority on what that constitutes.

>
>> Being "in a 100% unicode [sic] environment" (whatever that's supposed tomean) does not excuse
> > any responsibilities.

>
>> Nor does it obviate the need for the occasional "\uXXXX" in source.

>
>> However, I don't think the lexer deals with that. Unicode escape sequences are a precompile
>> phenomenon. Everything is substituted before parsing starts.

>
> Well - lexing happens before parsing so ...


So does writing source code. What's your point?

My point is that the lexer picks up after the substitution of Unicode sequences.
However, my point is wrong, and yours is right.

http://www.docjar.com/html/api/com/s...exer.java.html

--
Lew

 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-03-2013

>
> Well - since he is writing a lexer for Java then ...


A little more on the project... while the over all project *IS* for fun a few components may find there way into more serious work related projects but only to be used on code written by me or others on my team... specifically we may use the lexing/parsing component to make the following tools (the actual code generation/etc. of the compilation is currently purely fun [seenote]):

1. Scan for a complete list of classes referenced by a given class (our build system sometimes hiccups on not realizing that when class X calls an instance of class Y and Y has been modified it needs to recompile X {if, and only if, the signature(s) have changed})

2. Do some minor style enforcement like warning (have not decided if it should reject or just warn) if a class/method does not have something that at least looks like a javadoc header comment (/** ... */ is sufficient for this purpose)

Note:

A long term personal project of mine is to write a OS completely from the ground up in a super set of Java (the only addition I see that is needed is some type of "safe" pointer type)... in this case safe being defined as youcan assign a literal address to it but your not allowed to do ptr math on it
 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-03-2013
On Wednesday, January 2, 2013 8:27:21 PM UTC-5, Aryeh M. Friedman wrote:
> >

>
> > Well - since he is writing a lexer for Java then ...

>
>
>
> A little more on the project... while the over all project *IS* for fun afew components may find there way into more serious work related projects but only to be used on code written by me or others on my team... specifically we may use the lexing/parsing component to make the following tools (the actual code generation/etc. of the compilation is currently purely fun [see note]):
>
>
>
> 1. Scan for a complete list of classes referenced by a given class (our build system sometimes hiccups on not realizing that when class X calls an instance of class Y and Y has been modified it needs to recompile X {if, andonly if, the signature(s) have changed})
>
>
>
> 2. Do some minor style enforcement like warning (have not decided if it should reject or just warn) if a class/method does not have something that at least looks like a javadoc header comment (/** ... */ is sufficient for this purpose)
>
>
>
> Note:
>
>
>
> A long term personal project of mine is to write a OS completely from theground up in a super set of Java (the only addition I see that is needed is some type of "safe" pointer type)... in this case safe being defined as you can assign a literal address to it but your not allowed to do ptr math on it


In case anyone is interested I have some personal notes on the project at http://dt.fnwe.net/a-javacNative/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Isn't java.lang.Character.html#{ isLetterFromLang(int codePoint,String ISOLangDef) missing from the spec? Joshua Cranmer Java 5 12-05-2010 12:17 PM
Re: Isn't java.lang.Character.html#{ isLetterFromLang(int codePoint,String ISOLangDef) missing from the spec? Arne Vajhøj Java 2 12-05-2010 03:48 AM
How to control order of spec execution in "spec specs/* " ? Andrew Chen Ruby 1 03-25-2008 12:36 PM
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57