Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Scanner class and regex problem

Reply
Thread Tools

Scanner class and regex problem

 
 
Lee Weiner
Guest
Posts: n/a
 
      07-02-2005
I teach Java, and we're switching to 1.5.0 next semester. I was thinking
about using the Scanner class to read data from text files, but I'm having
a problem specifying a delimiter string.

The file I'm using for the following example contains two records:

Weiner@572-6544@57
Kirby@572-6544@36

Using the following:

import java.util.Scanner;
import java.io.FileNotFoundException;
import java.io.File;

public class ScannerFile
{
public static void main ( String[] args )
{
try
{
Scanner scan = new Scanner( new File( "lee.txt" ) );
scan.useDelimiter( "\\s+" ); //1 or more white space chars
while( scan.hasNext() )
{
System.out.println( "*" + scan.next() + "*" );
}
scan.close();
}
catch(FileNotFoundException exc)
{
System.out.println( "Error - Input file not found. Terminating." );
System.exit( 1 );
}
System.exit(0);
}
}

I get:

*Weiner@572-6544@57*
*Kirby@572-6544@36*

Exactly what I expect, but if I also want to delimit on the "@" signs with

scan.useDelimiter( "[@\\s+]" );

I get:

*Weiner*
*572-6544*
*57*
**
*Kirby*
*572-6544*
*36*
**

Can anyone tell me what I'm doing to cause that extra empty token at the
end of each record? I running under WindowsXP, if that's important.

Lee Weiner
lee AT leeweiner DOT org

 
Reply With Quote
 
 
 
 
Chris Smith
Guest
Posts: n/a
 
      07-02-2005
Lee Weiner <(E-Mail Removed)> wrote:
> Exactly what I expect, but if I also want to delimit on the "@" signs with
>
> scan.useDelimiter( "[@\\s+]" );
>
> I get:
>
> *Weiner*
> *572-6544*
> *57*
> **
> *Kirby*
> *572-6544*
> *36*
> **
>
> Can anyone tell me what I'm doing to cause that extra empty token at the
> end of each record?


Yes. Your regular expression is faulty. [@\\s+] means "either an @
symbol, or a single whitespace character, or a + symbol". None of your
input contained a + character, but things would have gotten even weirder
if it had. Perhaps you meant [@\\s]+ or @|(\\s+) (the difference being
that the first would not produce an empty token between two consecutive
@ characters, while the second one would).

Incidentally, when you're reading multiple records, it's far safer to
separate records first, then parse the record content. A simple error
in the input file here, rather than being detected, could cause you to
confuse names for phone numbers for the entire rest of the file and
store corrupt data that has to be hunted and purged after the error has
been discovered. That's not good.

Also incidentally, this question and many others like it should be
carefully read by the fanatics at Sun who seem to think, and write in
documentation, that just because a problem can be solved by regular
expressions, it has to be.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Flat Bed Scanner + Enlarger = Film Scanner? G. Huang Digital Photography 10 08-07-2011 03:46 PM
Regex testing and UTF8 awarenes or Regex and numeric pattern matching sln@netherlands.com Perl Misc 2 03-10-2009 03:51 AM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
Nested Class, Member Class, Inner Class, Local Class, Anonymous Class E11 Java 1 10-12-2005 03:34 PM
epson (or others) flat bed scanner vs film scanner Albert Ma Digital Photography 1 10-30-2004 02:39 AM



Advertisments