Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > How to scan Java source texts?

Reply
Thread Tools

How to scan Java source texts?

 
 
Stefan Ram
Guest
Posts: n/a
 
      06-11-2013
I'd like to scan Java source texts, printing one token per line.

I thought it might be possible with the compiler API, and
have read that it can return an AST, but I do not know how
to just obtain the tokens from the source code AST.

I am able to write a scanner for Java myself, but this would
take days. So I would like to shortcut it by using a Java SE
(with JDK) call. (I would not like to use a third-party
library, because when I use the Java SE compiler API, I can
be sure that this will be up-to-date with future Java-Versions.)

So, the best solution would be a short program getting this
information out of the Java compiler API. But I cannot find
an example for this in the web.

What does not seem to work is:

public class Main
{ public static void main( final java.lang.String[] args )throws java.io.IOException
{ final java.io.File javaFile = new java.io.File( "Main.java" );
final java.io.FileReader file = new java.io.FileReader( javaFile );
final java.io.StreamTokenizer streamTokenizer = new java.io.StreamTokenizer( file );
for( int i; true; )
{ i = streamTokenizer.nextToken();
if( i == java.io.StreamTokenizer.TT_EOF )break;
java.lang.System.out.println( streamTokenizer.sval ); }}}

Still, this gives the idea of what I want to accomplish.

For example, the scanner should decompose:

a+=b +"c\"d/*e"/*f*/
+g;

into

a
+=
b
+
"c\"d/*e"
/*f*/
+
g
;

(the comment »/*f*/« can as well be deleted; also, there is
no need for any further information, such as token types.)

 
Reply With Quote
 
 
 
 
Stefan Ram
Guest
Posts: n/a
 
      06-11-2013
http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de (Stefan Ram) writes:
>I am able to write a scanner for Java myself, but this would
>take days. So I would like to shortcut it by using a Java SE
>(with JDK) call. (I would not like to use a third-party


It might not be easy to get this right. For example, a
well-known popular source-code indenter did format the
several thousand lines of my Java project well, except for a
single case, where the source text »a=4.436e+3« was splitted
with a line-break at the wrong place as something like

a=4.436e
+3

 
Reply With Quote
 
 
 
 
markspace
Guest
Posts: n/a
 
      06-11-2013
On 6/11/2013 11:54 AM, Stefan Ram wrote:
> (E-Mail Removed)-berlin.de (Stefan Ram) writes:
>> I am able to write a scanner for Java myself, but this would
>> take days. So I would like to shortcut it by using a Java SE
>> (with JDK) call. (I would not like to use a third-party

>
> It might not be easy to get this right. For example, a



No it's not. I recommend a third party library. Antlr has a Java
syntax already worked out. There's also other dedicated Java parsers.

Note you're talking about two things here. Lexing and parsing. A lexer
breaks text up into tokens, a parser decides how to interpret the
result. Parsers traditionally have a lot more contextual information,
whereas lexers are just simpler state machines that break up text.



 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      06-11-2013
On 06/11/2013 12:26 PM, Stefan Ram wrote:
> I'd like to scan Java source texts, printing one token per line.


Do you mean these tokens:
<http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.5>

> I thought it might be possible with the compiler API, and
> have read that it can return an AST, but I do not know how
> to just obtain the tokens from the source code AST.


An AST is built from the tokens above.

[snip]
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      06-11-2013
Jeff Higgins <(E-Mail Removed)> writes:
>On 06/11/2013 12:26 PM, Stefan Ram wrote:
>>I'd like to scan Java source texts, printing one token per line.

>Do you mean these tokens:
><http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.5>


Yes.

>>I thought it might be possible with the compiler API, and
>>have read that it can return an AST, but I do not know how
>>to just obtain the tokens from the source code AST.

>An AST is built from the tokens above.


Yes. That's why the compiler still might have a copy of
the tokens lying around somewhere or might have a method
to get the next token. I just can't find such a method.

 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      06-12-2013
On 06/11/2013 05:02 PM, Stefan Ram wrote:
> Jeff Higgins <(E-Mail Removed)> writes:
>> On 06/11/2013 12:26 PM, Stefan Ram wrote:
>>> I'd like to scan Java source texts, printing one token per line.

>> Do you mean these tokens:
>> <http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.5>

>
> Yes.
>
>>> I thought it might be possible with the compiler API, and
>>> have read that it can return an AST, but I do not know how
>>> to just obtain the tokens from the source code AST.

>> An AST is built from the tokens above.

>
> Yes. That's why the compiler still might have a copy of
> the tokens lying around somewhere or might have a method
> to get the next token. I just can't find such a method.
>

I suspect, but don't know, that these tokens may have lost some
of the information associated with their being 'InputElements'
by the time the AST is constructed. It shouldn't be too hard
to find a Java lexer that will output as you request.
I'll look around when I have a little more time.

 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      06-12-2013
On 06/11/2013 10:07 PM, Jeff Higgins wrote:
> On 06/11/2013 05:02 PM, Stefan Ram wrote:
>> Jeff Higgins <(E-Mail Removed)> writes:
>>> On 06/11/2013 12:26 PM, Stefan Ram wrote:
>>>> I'd like to scan Java source texts, printing one token per line.
>>> Do you mean these tokens:
>>> <http://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.5>

>>
>> Yes.
>>
>>>> I thought it might be possible with the compiler API, and
>>>> have read that it can return an AST, but I do not know how
>>>> to just obtain the tokens from the source code AST.
>>> An AST is built from the tokens above.

>>
>> Yes. That's why the compiler still might have a copy of
>> the tokens lying around somewhere or might have a method
>> to get the next token. I just can't find such a method.
>>

> I suspect, but don't know, that these tokens may have lost some
> of the information associated with their being 'InputElements'
> by the time the AST is constructed. It shouldn't be too hard
> to find a Java lexer that will output as you request.
> I'll look around when I have a little more time.
>


From OpenJDK:

package com.sun.tools.javac.parser;

/** The lexical analyzer maps an input stream consisting of
* ASCII characters and Unicode escapes into a token sequence.
*
* <p><b>This is NOT part of any supported API.
* If you write code that depends on this, you do so at your own risk.
* This code and its internal interfaces are subject to change or
* deletion without notice.</b>
*/
public class Scanner implements Lexer {


 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      06-12-2013
On 06/12/2013 04:04 AM, Jeff Higgins wrote:
>
> From OpenJDK:
>

<http://openjdk.java.net/groups/compiler>
 
Reply With Quote
 
Jeff Higgins
Guest
Posts: n/a
 
      06-12-2013
On 06/12/2013 04:15 AM, Jeff Higgins wrote:
> On 06/12/2013 04:04 AM, Jeff Higgins wrote:
>>
>> From OpenJDK:
>>

> <http://openjdk.java.net/groups/compiler>

It turns out to be surprisingly easy to build javac.
It shouldn't be too hard to add a commandline switch -tokens
and the requisite code to output tokens one per line as
they appear using the Lexer interface.
I don't see a way to do what you want using the existing API.

 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      06-12-2013
Jeff Higgins <(E-Mail Removed)> writes:
>I don't see a way to do what you want using the existing API.


Thanks for your remarks!, which helped me
to find out that it can be done, once one
is willing to use the »com.sun....«-classes,
such as »Scanner«. »tools.jar« needs to be in
the classpath for this.

Now, there indeed is the risk that these classes
will change in future JDK versions. But still
I estimate them to be more stable than some
third-party libraries. For example, for the same
purpose I used a third-party program before that
now has not been adapted to Java >= 1.5, so that I
now needed to find some means to accomplish this
for Java >= 1.5.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to get JSON values and how to trace sessions?? webmaster@terradon.nl Python 2 04-25-2013 02:12 PM
3 java questions: debugging, templating and Swing app for Texts cleansing lbrtchx@hotmail.com Java 2 04-30-2007 12:18 AM
Showing different color texts on a text control in Java lrantisi Java 2 11-10-2006 08:47 PM
Showing different color texts on a text control in Java lrantisi Java 0 11-09-2006 08:18 PM
How to extract texts from html source? Sam Kong Ruby 12 06-02-2005 11:35 PM



Advertisments