Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > question on java lang spec chapter 3.3 (unicode char lexing)

Reply
Thread Tools

question on java lang spec chapter 3.3 (unicode char lexing)

 
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-03-2013
On Wednesday, January 2, 2013 9:20:09 PM UTC-5, Arne Vajh°j wrote:
> On 1/2/2013 9:16 PM, Aryeh M. Friedman wrote:
>
> >> All Java IDE's that I know can do that.

>
> >

>
> > Let's see we have tried eclipse, netbeans, bluej, dr. java and a few

>
> > others and every single one failed to produce jars that can be run

>
> > without after build changes to the manifest and/or needed libs that

>
> > came with the IDE

>
>
>
> It can be done.
>
>
>
> Obviously it can also be made not to work.
>
>
>
> Maybe you should master a Java IDE before writing an OS in Java.


an other requirement not satisfied by any IDE we have found is the ability to lay the source tree out in such a way that it can be compiled without the IDE (a requirement for almost all our projects because none of our clients have IDE's and in almost all cases there are minor changes needed to makethe code happy on their site that make testing impossible on the development machine)
 
Reply With Quote
 
 
 
 
Arne Vajh°j
Guest
Posts: n/a
 
      01-03-2013
On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:
> an other requirement not satisfied by any IDE we have found is the
> ability to lay the source tree out in such a way that it can be
> compiled without the IDE (a requirement for almost all our projects
> because none of our clients have IDE's and in almost all cases there
> are minor changes needed to make the code happy on their site that
> make testing impossible on the development machine)


The Java IDE's I know put code in a structure that fits
java tools, ant and maven.

Arne


 
Reply With Quote
 
 
 
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-03-2013
On Wednesday, January 2, 2013 9:26:03 PM UTC-5, Arne Vajh°j wrote:
> On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:
>
> > an other requirement not satisfied by any IDE we have found is the

>
> > ability to lay the source tree out in such a way that it can be

>
> > compiled without the IDE (a requirement for almost all our projects

>
> > because none of our clients have IDE's and in almost all cases there

>
> > are minor changes needed to make the code happy on their site that

>
> > make testing impossible on the development machine)

>
>
>
> The Java IDE's I know put code in a structure that fits
>
> java tools, ant and maven.
>
>
>
> Arne


And in almost any non-trivial case this is completely incorrect... even though I love Java as a lang I have a serious issue with some of the attitudes/assumptions made by tools... namely the universe does not revolve around the JVM
 
Reply With Quote
 
Arne Vajh°j
Guest
Posts: n/a
 
      01-03-2013
On 1/2/2013 9:27 PM, Aryeh M. Friedman wrote:
> On Wednesday, January 2, 2013 9:26:03 PM UTC-5, Arne Vajh°j wrote:
>> On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:
>>> an other requirement not satisfied by any IDE we have found is
>>> the
>>> ability to lay the source tree out in such a way that it can be
>>> compiled without the IDE (a requirement for almost all our
>>> projects
>>> because none of our clients have IDE's and in almost all cases
>>> there
>>> are minor changes needed to make the code happy on their site
>>> that

>>
>>> make testing impossible on the development machine)

>>
>> The Java IDE's I know put code in a structure that fits
>>
>> java tools, ant and maven.

>
> And in almost any non-trivial case this is completely incorrect...


Given that a big part (my estimate: 80-90%!) of all Java applications
are build:
- developer use IDE and checkin to VCS
- build process checkout from VCS and use ant/maven to build
then it has to be correct.

> even though I love Java as a lang I have a serious issue with some of
> the attitudes/assumptions made by tools... namely the universe does
> not revolve around the JVM


I find it natural that tools developed for Java development are the
best for Java development and tools developed for C development are
the best for C development and ... PHP ... Python ... etc..

Arne



 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-03-2013
On Wednesday, January 2, 2013 9:46:12 PM UTC-5, Arne Vajh°j wrote:
> On 1/2/2013 9:27 PM, Aryeh M. Friedman wrote:
>
> > On Wednesday, January 2, 2013 9:26:03 PM UTC-5, Arne Vajh´┐Żj wrote:

>
> >> On 1/2/2013 9:22 PM, Aryeh M. Friedman wrote:

>
> >>> an other requirement not satisfied by any IDE we have found is

>
> >>> the

>
> >>> ability to lay the source tree out in such a way that it can be

>
> >>> compiled without the IDE (a requirement for almost all our

>
> >>> projects

>
> >>> because none of our clients have IDE's and in almost all cases

>
> >>> there

>
> >>> are minor changes needed to make the code happy on their site

>
> >>> that

>
> >>

>
> >>> make testing impossible on the development machine)

>
> >>

>
> >> The Java IDE's I know put code in a structure that fits

>
> >>

>
> >> java tools, ant and maven.

>
> >

>
> > And in almost any non-trivial case this is completely incorrect...

>
>
>
> Given that a big part (my estimate: 80-90%!) of all Java applications
>
> are build:
>
> - developer use IDE and checkin to VCS
>
> - build process checkout from VCS and use ant/maven to build
>
> then it has to be correct.


Correct in what sense? Passing it's own tests? If that is the case aegisis the *ONLY* VCS that actually requires this before a checkin. The idea there is the baseline (repo in most other VCS's jargon) is guernteed to be working (as defined above)). Namely every modification is *NEW* [see note]atomic in regards to new functionality and *MUST* be accompanied by automated tests (it is possible to turn this off but for obvious reasons not recommended unless the change is essencially untestable like documentation updates).


>
>
>
> > even though I love Java as a lang I have a serious issue with some of

>
> > the attitudes/assumptions made by tools... namely the universe does

>
> > not revolve around the JVM

>
>
>
> I find it natural that tools developed for Java development are the
>
> best for Java development and tools developed for C development are
>
> the best for C development and ... PHP ... Python ... etc..


Most real world projects (unless they a part of a larger effort) have several components/languages (for us for example it is typical to have a HTML/CSS/JS component and a Java/"JSP" component [I am defining "JSP" a little loosely because we often need to support more then just web front-ends]... it is also common for us to have some native code accessed via a JNLP wrapper)....

Note:

There is a slight mismatch between aegis's requirements in this reguard andhow xUnit like frameworks work. We typically solve this by reusing the same test script but requiring that the total number of pass's needs to be at least one larger then the previous change.
 
Reply With Quote
 
Arne Vajh├Şj
Guest
Posts: n/a
 
      01-03-2013
On 1/3/2013 4:14 PM, Martin Gregorie wrote:
> On Wed, 02 Jan 2013 19:56:13 -0500, Arne Vajh├Şj wrote:
>
>> On 1/2/2013 2:17 PM, Roedy Green wrote:
>>> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"
>>> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
>>> who said :
>>>
>>>> (\uXXXX)
>>>
>>> The only places you encounter such escapes are in Java source and
>>> possibly resource bundles.

>>
>> Well - since he is writing a lexer for Java then ...
>>

> ...which, being lazy, I would not do from scratch.
>
> Instead, I'd use the Java version of the Coco/R package, which generates
> the lexer and parser as Java source within a framework. Unlike some
> similar tools, you're almost encouraged to rewrite the framework to suit
> your requirements. This is quite short and written in standard Java, so
> modifying it is very easy.


Good point.

Arne


 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-04-2013
On Thursday, January 3, 2013 5:51:55 PM UTC-5, Arne Vajh°j wrote:
> On 1/3/2013 4:14 PM, Martin Gregorie wrote:
>
> > On Wed, 02 Jan 2013 19:56:13 -0500, Arne Vajh°j wrote:

>
> >

>
> >> On 1/2/2013 2:17 PM, Roedy Green wrote:

>
> >>> On Wed, 2 Jan 2013 00:20:12 -0800 (PST), "Aryeh M. Friedman"

>
> >>> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone

>
> >>> who said :

>
> >>>

>
> >>>> (\uXXXX)

>
> >>>

>
> >>> The only places you encounter such escapes are in Java source and

>
> >>> possibly resource bundles.

>
> >>

>
> >> Well - since he is writing a lexer for Java then ...

>
> >>

>
> > ...which, being lazy, I would not do from scratch.

>
> >

>
> > Instead, I'd use the Java version of the Coco/R package, which generates

>
> > the lexer and parser as Java source within a framework. Unlike some

>
> > similar tools, you're almost encouraged to rewrite the framework to suit

>
> > your requirements. This is quite short and written in standard Java, so

>
> > modifying it is very easy.

>
>
>
> Good point.
>
>
>
> Arne


The only issue is likely a philosophical one in that I have *NEVER* trustedcode generators of any kind they either produce impossible to follow/debugcode or have all kinds of fluff in them (the classic example in my mind [html which is not really a programming lang ] is Dreamweaver that produces 75 lines of HTML for "hello, world").
 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-05-2013
On Saturday, January 5, 2013 8:03:00 AM UTC-5, Chris Uppal wrote:
> Aryeh M. Friedman wrote:
>
>
>
> > The only issue is likely a philosophical one in that I have *NEVER*

>
> > trusted code generators of any kind

>
>
>
> So you don't care for compilers ?
>
>
>
>
>
>
>
> -- chris
>
>
>
> P.S. Seriously: the point of classic compiler generators (or
>
> "compiler-compilers" as they were often called) are to produce code that works
>
> and that runs fast in little space. It is not /AT ALL/ a design principle that
>
> the code should be comprehensible to humans -- in fact for the kinds of
>
> algorithms they use, there is no way the resulting code and tables could be
>
> remotely comprehensible (to an ordinary programmer), that is /why/ we usecode
>
> generators.


Machine code was never meant to be readable but high level languages can and should be .... on the serious side of the debate there are reasons forshying away from code generators in my case that are currently proprietary(some of the lesser results will likely be FOSS'ed though)... the main reason is we need to (in some cases) deal with multiple languages in the same compilation unit and have developed fairly good (at least in theory and my "fun work" is really nothing more then a proof of concept, without the pressure of deadlines and such, with Java as a typical non-trivial language to work with from the compiler POV)... due to the above using a parse generator would make it very inefficient to create the needed parsers since they are (by there very nature) very non-OO in how they deal with more then one grammar at once... namely they are designed to deal with single languages at a time and not "families" of them
 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-05-2013
On Saturday, January 5, 2013 7:58:57 AM UTC-5, Chris Uppal wrote:
> Patricia Shanahan wrote:
>
>
>
> > You would at least need to detect the escapes to get a usable error

>
> > message. Once you have done that, it is so easy to replace each escape

>
> > with the equivalent Unicode character that it is not worth doing

>
> > anything else.

>
>
>
> I'm not so sure about that. IIRC the rules about interpretting Unicode escapes
>
> have some seriously wierd convolutions. Something to do with protecting against
>
> multiply-encoded files, I think. It badly fails the Principle of Least WTF.
>
>
>
> It's in the spec, but I'm too lazy to go find the exact reference
>
>
>
> -- chruis


agreed for example the following is just ugly but perfectly valid Java code:

Foo.java:
\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u 006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u00 0A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069 \u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u 0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u00 69\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067 \u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u 000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u00 73\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E \u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u 0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u00 77\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A \u0009\u007D\u000A\u007D\u000A

% javac Foo.java
% java Foo
hello, world
 
Reply With Quote
 
Aryeh M. Friedman
Guest
Posts: n/a
 
      01-05-2013
On Saturday, January 5, 2013 8:34:38 AM UTC-5, Aryeh M. Friedman wrote:
> On Saturday, January 5, 2013 7:58:57 AM UTC-5, Chris Uppal wrote:
>
> > Patricia Shanahan wrote:

>
> >

>
> >

>
> >

>
> > > You would at least need to detect the escapes to get a usable error

>
> >

>
> > > message. Once you have done that, it is so easy to replace each escape

>
> >

>
> > > with the equivalent Unicode character that it is not worth doing

>
> >

>
> > > anything else.

>
> >

>
> >

>
> >

>
> > I'm not so sure about that. IIRC the rules about interpretting Unicodeescapes

>
> >

>
> > have some seriously wierd convolutions. Something to do with protectingagainst

>
> >

>
> > multiply-encoded files, I think. It badly fails the Principle of LeastWTF.

>
> >

>
> >

>
> >

>
> > It's in the spec, but I'm too lazy to go find the exact reference

>
> >

>
> >

>
> >

>
> > -- chruis

>
>
>
> agreed for example the following is just ugly but perfectly valid Java code:
>
>
>
> Foo.java:
>
> \u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u 006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u00 0A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069 \u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u 0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u00 69\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067 \u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u 000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u00 73\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E \u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u 0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u00 77\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A \u0009\u007D\u000A\u007D\u000A
>
>
>
> % javac Foo.java
>
> % java Foo
>
> hello, world


Just a quick note I did end up implementing unicode escapes the way JLSv3 says to and the above is one our test inputs...
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Isn't java.lang.Character.html#{ isLetterFromLang(int codePoint,String ISOLangDef) missing from the spec? Joshua Cranmer Java 5 12-05-2010 12:17 PM
Re: Isn't java.lang.Character.html#{ isLetterFromLang(int codePoint,String ISOLangDef) missing from the spec? Arne Vajh°j Java 2 12-05-2010 03:48 AM
How to control order of spec execution in "spec specs/* " ? Andrew Chen Ruby 1 03-25-2008 12:36 PM
(const char *cp) and (char *p) are consistent type, (const char **cpp) and (char **pp) are not consistent lovecreatesbeauty C Programming 1 05-09-2006 08:01 AM
/usr/bin/ld: ../../dist/lib/libjsdombase_s.a(BlockGrouper.o)(.text+0x98): unresolvable relocation against symbol `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostre silverburgh.meryl@gmail.com C++ 3 03-09-2006 12:14 AM



Advertisments